View Spellcheck Instructions

.........*.......*.......*.......*.......*.......*.......*.......*.< TM0 LM8 LS1 DH/Alan Blundell/SPELLCHECK/Page |P/ DF///More .../ Many wordprocessing packages these days include a spelling checker, and not just those at the top end of the market. In the PC world, there is a proliferation of wp packages and competition between them is great enough to ensure that most suppliers' packages contain a spellcheck feature. In the more humble world of Acorn's BBC Micro series, there is less choice of packages and suppliers have less memory to play with in deciding which features to include. Outside the Education market, the most successful wp packages have been from Acorn itself (the View series) and from Computer Concepts (Wordwise, Wordwise Plus and InterWord). All of these products are designed to work in 'sideways ROM' format, allowing the maximum amount of workspace for text in the available 32k of RAM. None of the packages includes a spelling checker. Spellcheck packages are available for BBC machines, however: Acorn offers ViewSpell, Computer Concepts offer SpellMaster and several others offer similar packages. All of these operate from sideways ROMs with dictionary files held on disc - except for the Computer Concepts offering, which includes a large dictionary within its paged ROM. However, many more BBC micro users own a wordprocessing program than own a spelling checker; witness the Master Series machines, which are supplied ready-equipped with the View wordprocessor. The relative cost of a spelling checker to someone with a BBC micro + wordprocessor also does little to encourage their widespread use: ViewSpell costs around £30, SpellMaster around £42. The (relatively) high cost of adding spellcheck facilities to a BBC Micro wordprocessor is likely to deter all but the most serious user. Help is now at hand! The program suite presented here is a spelling checker for View and straight ASCII files. I have been unable to test it with the Computer Concepts products, but it should work with these too with appropriate modifications. The programs are written mostly in BBC Basic, with speed-critical sections in 6502 assembler, and will work with BBC Basic 2 upwards and Acorn MOS 1.2 upwards (this includes most Model B's, Master 128 and Master Compact). Only 'legal' techniques and calls are used, so it may even work under the RISCOS 6502 emulator on the Archimedes! The suite consists of seven files in all: LS0 !BOOT - A convenient means of starting the program only SPELL - The main spellcheck program MAKEDIC - A dictionary file creation utility TXTCONV - A utility to convert a dictionary file into a text file (useful for dictionary maintenance) PE DICCONV - A utility to add all words in a text file to a dictionary file automatically (also for maintenance) DICWIPE - A utility allowing words incorrectly added to the dictionary to be removed DICTION - A sample dictionary file to start off with! How To Use SpellCheck _____________________ LS1 Once a wordprocessed document is stored on disk, in a View format or as straight ASCII text, the spelling checker can be entered by booting the disc as provided (using SHIFT-BREAK), or by entering the Basic language (by typing *BASIC <enter>) and CHAINing the main program, "SPELL". By either means, the result is that the screen is cleared and the SpellCheck main screen is displayed. The user is prompted to enter the filename of the text or document to be checked, press <ESCAPE> to quit the program or <RETURN> without entering a filename to access the utilities menu. Filenames may include drive and/or directory specifications. Once a filename has been entered, the check begins automatically. A constantly-updated count is displayed of the number of words processed and the number of unique words found to date. Once the whole file has been processed, a list will have been created in memory of each of the words used. The list will contain only one copy of each word, in alphabetical order. Next, this list is compared with words which have already been included in or added to the dictionary file. As a word is found, it is removed from the list and at the end of this stage, the list contains only those words which appear in the document or text but not in the dictionary. All words not found in the dictionary are shown on screen and the user is then given the choice of adding words to the dictionary or making corrections to the document or text, or of closing the file. If the option is chosen, each of the words remaining in the list is presented in turn, and the option is available to add that word to the dictionary, ignore the word (if it is correct but the user doesn't want it permanently in the dictionary), or to enter a corrected spelling for the word. After all words in the list have been dealt with, if any corrected spellings have been entered, the program is designed to automatically call View, change all occurrences of misspelled words to the corrected spelling and return to the opening screen. LS0 The Utilities Menu __________________ The utilities menu offers six options: 1. Remove word(s) from the dictionary 2. Convert the dictionary to a text file 3. Add all of the words in a text file to the dictionary 4. Create a new dictionary 5. Perform an operating system ('*') command 6. Return to the main screen LS1 PE Of these, option '5' is likely to be the one most frequently used, for such things as cataloguing a disc to check a forgotten filename. Any OS command may be issued, and no checking is done by SpellCheck. Care should be taken, however, not to issue commands which may result in corruption of main RAM (such as formatting or compacting a disc). No permanent harm would result, but the program would be wiped from memory in the process! Option '1' is designed for those of us with 'slippery fingers' and to deal with the occasional inevitable slip. It can be used to remove one or more words from the dictionary, usually because it has been added in error by the user. The facility allows for the use of 'wild cards' in the specification of the word(s) to be removed. The specification can include the special characters '*' - to indicate any number of unspecified letters (or none at all) '#' - to indicate one unspecified character Thus, if the word 'SPELL' had been inadvertently added to the dictionary as 'SPEL', and as part of 'SPELING', 'SPELCHECK' and 'SPELCHECKER': the specification 'SPEL*' would select 'SPEL' 'SPELING' 'SPELCHECK" and 'SPELCHECKER' the specification 'SP*' would also select all of these, but may in addition select other words beginning with 'SP', such as 'SPOT' and 'SPRITE' the specification 'SPELC*' would select 'SPELCHECK' and 'SPELCHECKER' the specification 'SPE#' would select 'SPEL' the specification 'SPEL' would select 'SPEL' the specification 'SPEL#' wouldn't select anything Once a specification has been entered, all words matching the specification are displayed and the user is given the option to delete all of these, to refine or change the specification, or to press <ESCAPE> to return to the main program. Options '2', '3' and '4' are provided so that the user can decide how much disk space to devote to the dictionary file. For example, if a dictionary file size of 60k is chosen to begin with, although this is likely to be ample for some time, eventually the dictionary will begin to fill and it will be necessary to increase the dictionary size. Rather than simply create a new, empty, dictionary, it is possible to convert the dictionary into a text file, create a new, larger dictionary file and to recopy all of the words from the text file back into the dictionary. The size of the dictionary file is determined by the DATA statements at the end of the MAKEDIC program; this is in the form of the number of 256-byte 'pages' of space to be reserved for each letter of the alphabet in turn. To create a larger dictionary, simply increase the numbers in the DATA statements and run the program. Make sure that a copy of the dictionary has been made first or a conversion to a text file has been carried out, though, as any existing dictionary file will be lost! (The program provides ample warning of this.) LS0 Design Considerations _____________________ LS1 SPELLCHECK was designed from the start in BBC Basic 4 on a BBC Master 128 micro, although it will work on any (6502-based) BBC Micro with OS 1.2 onwards and BBC Basic 2 onwards. The program design is highly modular and is based on the following algorithm: LS0 +-----------------------------+ | Initialise screen display | (PROCinit) | and RAM-based buffers for | | word list and dictionary. | +-----------------------------+ | v +--------------------------------+ (PROCopenfile) | Get text file name & open file | +--------------------------------+ | v -------- / was \ / <ESCAPE> \ +--------------------+ \ pressed? /- y ->| Clear screen & end.| \ / +--------------------+ -------- | n | v -------- / was \ / <RETURN> \ +----------------+ \ pressed? /-- y -->| Show Utils menu| \ / +----------------+ -------- | n | v (FNreadword) +---------------+ | Read a word |<-----------+ | from the file |<-----+ | +---------------+ | | | | | v | | --------- | | / \ | | /Is the word\ | | |already in |-- y ---+ | \ the list? / | \ / | --------- | | | n | | | v | +--------------------+ | (PROCaddtolist) | Add the word to | | | the list in memory | | +--------------------+ | | | v | ------ | / \ | / Reached \ | | the end |-- n ----------+ \ of file?/ \ / ------- | y | v +------------------------+ | Load the dictionary | (PROCcheck) | buffer with all words |<---------+ | starting with next | | | letter (start with 'A')| | +------------------------+ | | | v | +-------------------------+ | +------->| Take the next word from | | | | the text list in memory | | | +-------------------------+ | | | | | v | | ------------------------- | | / \ | | / Is the first letter the \ | | | same as the current letter |-- n --+ | \ for the dictionary buffer? / | \ / | -------------------------- | | | y | | | v | ------------------------- | / \ | / Is the word already in \ | | the dictionary (as a |-- n --+ | \previously verified word)?/ | | \ / | | ------------------------ | | | | | v | | +-----------------------+ | | | Remove word from list | | | +-----------------------+ | | | | | v | | ----------------- <------------+ | / \ | / Was that the last \ +---- n --\ word in the list? / \ / ------------------ | y | v ------------------- / \ / Are there any words \ +--------------+ \ still in the list? / -- n --| End of check | \ / +--------------+ ------------------- | y | v ------------------- / \ (PROCaction) / Does the user want \ +--------------+ / to make corrections \- n -| End of check | \ or add word(s) from / +--------------+ \ list to dictionary? / \ / ------------------- | y | v +-------------------------------+ | Present each word in turn. | | If no action, pass to next | | If to be added to dictionary, | | call appropriate routine. | | If to be corrected, input the | | corrected version. | +-------------------------------+ | v --------------- / \ / Are there any \ +--------------+ | corrections to |- n -| End of check | \ be carried out? / +--------------+ \ / --------------- | y | v +-------------------------+ | Call up VIEW, carry out | | corrections, & return. | +-------------------------+ | v +--------------+ | End of check | +--------------+ LS1 The PROCedures noted above are the main elements of the structure, but are supported by a number of more specific calls. For the purposes of SPELLCHECK, a word is defined as any series of alphabetic characters two or more in length. Whilst this is at first sight a commonsense definition, and certainly one which is easy to program for, it does lead to some outcomes which are not immediately obvious. For example, the word -isn't- (is not) is taken by SPELLCHECK to be two separate words, -isn- and -t- , separated by an apostrophe. The -t- is ignored as not of the minimum length for a word and the program duly reports a word -ISN- as having been found in the text. Until you get used to this, you could be perplexed by some strange spelling mistakes! At the time, I felt this to be preferable to allowing punctuation marks within words, as that could lead to worse confusion. SPELLCHECK does not distinguish at all between upper and lower case text. Also, SPELLCHECK does not check for plurals or other variations on a word root. This decision was made simply for simplicity of design and a desire to produce a working program! The effect is that, for example, WAIT WAITS WAITER WAITING AWAIT AWAITS (etc....) are all treated as separate words, each of which would take up space in the word list in memory and in the dictionary file and buffer. A future version of SPELLCHECK may improve on the present approach, with the advantages of more compact storage on disc and in memory (and perhaps some speed improvement). The dictionary file maintained by SPELLCHECK is designed such that the program can load into a memory buffer all words beginning with a particular letter at once. This approach was found to greatly reduce disc accesses by the program, speeding up its operation correspondingly, as accessing RAM is far faster than accessing a disk. In order to make the dictionary file as compact as possible, I did some research on the relative frequencies of words beginning with each letter; there are obviously more words beginning with the letter 'S' than with 'Q' (for example). As I couldn't find any easy reference to help with this, I reproduce the results of my own analysis below, in case anyone else should find use for them: LS0 Relative frequencies of words in English beginning with: ======================================================== A 14 H 8 O 4 V 4 B 12 I 8 P 22 W 6 C 22 J 2 Q 2 X 1 D 16 K 2 R 12 Y 1 E 10 L 8 S 28 Z 1 F 12 M 4 T 8 G 8 N 4 U 2 PE PROGRAMMING CONSIDERATIONS. ___________________________ LS1 The early versions of SPELLCHECK which I concocted during development were written solely in Basic. It soon became apparent that the speed of interpreted Basic was inadequate for any except the shortest text files. One version took nearly 40 minutes to check one page of text! Rather than go the whole hog and convert the whole thing to assembler, I identified the speed- and time- critical parts of the program. These were related to the means of storage of word lists in memory. Most of the working time of the program was spent inserting words in the list in alphabetical order, searching through the list for word matches and removing correct words from the list after identification. The Basic PROCedures which originally carried out these tasks were rewritten in assembler. There were immediate and dramatic speed improvements - of the order of 100 times. Whilst the speed of the finished program may not compare with commercial products, it is not dramatically worse and I have found it to be perfectly acceptable in use. Screen mode 7 was chosen for the program from the start because it allows the maximum amount of RAM to remain free for use by the program (important if the machine to be used has no shadow RAM!). It is physically possible to combine all of the separate programs which constitute SPELLCHECK into one, with the obvious advantages of less files on disk, no repetition of common code and less disk access. However, I decided to split the utilities away from the main program, as RAM space is at a premium on the BBC Micro and about 9K is used for dictionary, word list and other buffers. The main program, SPELL, carries out the main functions, whereas functions such as rebuilding dictionaries and dictionary maintenance are called as separate programs. This makes sense, as once the dictionary has been set up, these functions are only infrequently used. This was also the rationale for making access to the utilities through the absence of a name for a file to check (just pressing RETURN), as the alternative would have been to choose the main spellcheck activity from a menu each time the program was run. By assuming that a file is to be checked at the start, less 'setting up' is needed each time the program is used. DF//End of text//