.........*.......*.......*.......*.......*.......*.......*.......*.<
TM0
LM8
LS1
DH/Alan Blundell/SPELLCHECK/Page |P/
DF///More .../
Many wordprocessing packages these days include a spelling
checker, and not just those at the top end of the market. In the
PC world, there is a proliferation of wp packages and competition
between them is great enough to ensure that most suppliers'
packages contain a spellcheck feature.
In the more humble world of Acorn's BBC Micro series, there is
less choice of packages and suppliers have less memory to play
with in deciding which features to include. Outside the Education
market, the most successful wp packages have been from Acorn
itself (the View series) and from Computer Concepts (Wordwise,
Wordwise Plus and InterWord). All of these products are designed
to work in 'sideways ROM' format, allowing the maximum amount of
workspace for text in the available 32k of RAM. None of the
packages includes a spelling checker.
Spellcheck packages are available for BBC machines, however: Acorn
offers ViewSpell, Computer Concepts offer SpellMaster and several
others offer similar packages. All of these operate from sideways
ROMs with dictionary files held on disc - except for the Computer
Concepts offering, which includes a large dictionary within its
paged ROM.
However, many more BBC micro users own a wordprocessing program
than own a spelling checker; witness the Master Series machines,
which are supplied ready-equipped with the View wordprocessor.
The relative cost of a spelling checker to someone with a BBC
micro + wordprocessor also does little to encourage their
widespread use: ViewSpell costs around £30, SpellMaster around
£42. The (relatively) high cost of adding spellcheck facilities
to a BBC Micro wordprocessor is likely to deter all but the most
serious user.
Help is now at hand! The program suite presented here is a
spelling checker for View and straight ASCII files. I have been
unable to test it with the Computer Concepts products, but it
should work with these too with appropriate modifications.
The programs are written mostly in BBC Basic, with speed-critical
sections in 6502 assembler, and will work with BBC Basic 2 upwards
and Acorn MOS 1.2 upwards (this includes most Model B's, Master
128 and Master Compact). Only 'legal' techniques and calls are
used, so it may even work under the RISCOS 6502 emulator on the
Archimedes!
The suite consists of seven files in all:
LS0
!BOOT - A convenient means of starting the program only
SPELL - The main spellcheck program
MAKEDIC - A dictionary file creation utility
TXTCONV - A utility to convert a dictionary file into a
text file
(useful for dictionary maintenance)
PE
DICCONV - A utility to add all words in a text file to a
dictionary file automatically (also for
maintenance)
DICWIPE - A utility allowing words incorrectly added to the
dictionary to be removed
DICTION - A sample dictionary file to start off with!
How To Use SpellCheck
_____________________
LS1
Once a wordprocessed document is stored on disk, in a View format
or as straight ASCII text, the spelling checker can be entered by
booting the disc as provided (using SHIFT-BREAK), or by entering
the Basic language (by typing *BASIC <enter>) and CHAINing the
main program, "SPELL". By either means, the result is that the
screen is cleared and the SpellCheck main screen is displayed.
The user is prompted to enter the filename of the text or document
to be checked, press <ESCAPE> to quit the program or <RETURN>
without entering a filename to access the utilities menu.
Filenames may include drive and/or directory specifications.
Once a filename has been entered, the check begins automatically.
A constantly-updated count is displayed of the number of words
processed and the number of unique words found to date. Once the
whole file has been processed, a list will have been created in
memory of each of the words used. The list will contain only one
copy of each word, in alphabetical order. Next, this list is
compared with words which have already been included in or added
to the dictionary file. As a word is found, it is removed from
the list and at the end of this stage, the list contains only
those words which appear in the document or text but not in the
dictionary.
All words not found in the dictionary are shown on screen and the
user is then given the choice of adding words to the dictionary or
making corrections to the document or text, or of closing the
file. If the option is chosen, each of the words remaining in the
list is presented in turn, and the option is available to add that
word to the dictionary, ignore the word (if it is correct but the
user doesn't want it permanently in the dictionary), or to enter a
corrected spelling for the word. After all words in the list have
been dealt with, if any corrected spellings have been entered, the
program is designed to automatically call View, change all
occurrences of misspelled words to the corrected spelling and
return to the opening screen.
LS0
The Utilities Menu
__________________
The utilities menu offers six options:
1. Remove word(s) from the dictionary
2. Convert the dictionary to a text file
3. Add all of the words in a text file to the dictionary
4. Create a new dictionary
5. Perform an operating system ('*') command
6. Return to the main screen
LS1
PE
Of these, option '5' is likely to be the one most frequently used,
for such things as cataloguing a disc to check a forgotten
filename. Any OS command may be issued, and no checking is done
by SpellCheck. Care should be taken, however, not to issue
commands which may result in corruption of main RAM (such as
formatting or compacting a disc). No permanent harm would result,
but the program would be wiped from memory in the process!
Option '1' is designed for those of us with 'slippery fingers' and
to deal with the occasional inevitable slip. It can be used to
remove one or more words from the dictionary, usually because it
has been added in error by the user. The facility allows for the
use of 'wild cards' in the specification of the word(s) to be
removed. The specification can include the special characters
'*' - to indicate any number of unspecified letters
(or none at all)
'#' - to indicate one unspecified character
Thus, if the word 'SPELL' had been inadvertently added to the
dictionary as 'SPEL', and as part of 'SPELING', 'SPELCHECK' and
'SPELCHECKER':
the specification 'SPEL*' would select 'SPEL'
'SPELING'
'SPELCHECK"
and 'SPELCHECKER'
the specification 'SP*' would also select all of these, but may in
addition select other words beginning with 'SP', such as 'SPOT'
and 'SPRITE'
the specification 'SPELC*' would select 'SPELCHECK'
and 'SPELCHECKER'
the specification 'SPE#' would select 'SPEL'
the specification 'SPEL' would select 'SPEL'
the specification 'SPEL#' wouldn't select anything
Once a specification has been entered, all words matching the
specification are displayed and the user is given the option to
delete all of these, to refine or change the specification, or to
press <ESCAPE> to return to the main program.
Options '2', '3' and '4' are provided so that the user can decide
how much disk space to devote to the dictionary file. For example,
if a dictionary file size of 60k is chosen to begin with, although
this is likely to be ample for some time, eventually the
dictionary will begin to fill and it will be necessary to increase
the dictionary size. Rather than simply create a new, empty,
dictionary, it is possible to convert the dictionary into a text
file, create a new, larger dictionary file and to recopy all of
the words from the text file back into the dictionary. The size
of the dictionary file is determined by the DATA statements at the
end of the MAKEDIC program; this is in the form of the number of
256-byte 'pages' of space to be reserved for each letter of the
alphabet in turn. To create a larger dictionary, simply increase
the numbers in the DATA statements and run the program. Make sure
that a copy of the dictionary has been made first or a conversion
to a text file has been carried out, though, as any existing
dictionary file will be lost! (The program provides ample warning
of this.)
LS0
Design Considerations
_____________________
LS1
SPELLCHECK was designed from the start in BBC Basic 4 on a BBC
Master 128 micro, although it will work on any (6502-based) BBC
Micro with OS 1.2 onwards and BBC Basic 2 onwards. The program
design is highly modular and is based on the following algorithm:
LS0
+-----------------------------+
| Initialise screen display |
(PROCinit) | and RAM-based buffers for |
| word list and dictionary. |
+-----------------------------+
|
v
+--------------------------------+
(PROCopenfile) | Get text file name & open file |
+--------------------------------+
|
v
--------
/ was \
/ <ESCAPE> \ +--------------------+
\ pressed? /- y ->| Clear screen & end.|
\ / +--------------------+
--------
|
n
|
v
--------
/ was \
/ <RETURN> \ +----------------+
\ pressed? /-- y -->| Show Utils menu|
\ / +----------------+
--------
|
n
|
v
(FNreadword) +---------------+
| Read a word |<-----------+
| from the file |<-----+ |
+---------------+ | |
| | |
v | |
--------- | |
/ \ | |
/Is the word\ | |
|already in |-- y ---+ |
\ the list? / |
\ / |
--------- |
| |
n |
| |
v |
+--------------------+ |
(PROCaddtolist) | Add the word to | |
| the list in memory | |
+--------------------+ |
| |
v |
------ |
/ \ |
/ Reached \ |
| the end |-- n ----------+
\ of file?/
\ /
-------
|
y
|
v
+------------------------+
| Load the dictionary |
(PROCcheck) | buffer with all words |<---------+
| starting with next | |
| letter (start with 'A')| |
+------------------------+ |
| |
v |
+-------------------------+ |
+------->| Take the next word from | |
| | the text list in memory | |
| +-------------------------+ |
| | |
| v |
| ------------------------- |
| / \ |
| / Is the first letter the \ |
| | same as the current letter |-- n --+
| \ for the dictionary buffer? /
| \ /
| --------------------------
| |
| y
| |
| v
| -------------------------
| / \
| / Is the word already in \
| | the dictionary (as a |-- n --+
| \previously verified word)?/ |
| \ / |
| ------------------------ |
| | |
| v |
| +-----------------------+ |
| | Remove word from list | |
| +-----------------------+ |
| | |
| v |
| ----------------- <------------+
| / \
| / Was that the last \
+---- n --\ word in the list? /
\ /
------------------
|
y
|
v
-------------------
/ \
/ Are there any words \ +--------------+
\ still in the list? / -- n --| End of check |
\ / +--------------+
-------------------
|
y
|
v
-------------------
/ \
(PROCaction) / Does the user want \ +--------------+
/ to make corrections \- n -| End of check |
\ or add word(s) from / +--------------+
\ list to dictionary? /
\ /
-------------------
|
y
|
v
+-------------------------------+
| Present each word in turn. |
| If no action, pass to next |
| If to be added to dictionary, |
| call appropriate routine. |
| If to be corrected, input the |
| corrected version. |
+-------------------------------+
|
v
---------------
/ \
/ Are there any \ +--------------+
| corrections to |- n -| End of check |
\ be carried out? / +--------------+
\ /
---------------
|
y
|
v
+-------------------------+
| Call up VIEW, carry out |
| corrections, & return. |
+-------------------------+
|
v
+--------------+
| End of check |
+--------------+
LS1
The PROCedures noted above are the main elements of the structure,
but are supported by a number of more specific calls.
For the purposes of SPELLCHECK, a word is defined as any series of
alphabetic characters two or more in length. Whilst this is at
first sight a commonsense definition, and certainly one which is
easy to program for, it does lead to some outcomes which are not
immediately obvious. For example, the word -isn't- (is not) is
taken by SPELLCHECK to be two separate words, -isn- and -t- ,
separated by an apostrophe. The -t- is ignored as not of the
minimum length for a word and the program duly reports a word
-ISN- as having been found in the text. Until you get used to
this, you could be perplexed by some strange spelling mistakes!
At the time, I felt this to be preferable to allowing punctuation
marks within words, as that could lead to worse confusion.
SPELLCHECK does not distinguish at all between upper and lower
case text. Also, SPELLCHECK does not check for plurals or other
variations on a word root. This decision was made simply for
simplicity of design and a desire to produce a working program!
The effect is that, for example,
WAIT WAITS WAITER WAITING AWAIT AWAITS (etc....)
are all treated as separate words, each of which would take up
space in the word list in memory and in the dictionary file and
buffer. A future version of SPELLCHECK may improve on the present
approach, with the advantages of more compact storage on disc and
in memory (and perhaps some speed improvement).
The dictionary file maintained by SPELLCHECK is designed such that
the program can load into a memory buffer all words beginning with
a particular letter at once. This approach was found to greatly
reduce disc accesses by the program, speeding up its operation
correspondingly, as accessing RAM is far faster than accessing a
disk. In order to make the dictionary file as compact as
possible, I did some research on the relative frequencies of words
beginning with each letter; there are obviously more words
beginning with the letter 'S' than with 'Q' (for example). As I
couldn't find any easy reference to help with this, I reproduce
the results of my own analysis below, in case anyone else should
find use for them:
LS0
Relative frequencies of words in English beginning with:
========================================================
A 14 H 8 O 4 V 4
B 12 I 8 P 22 W 6
C 22 J 2 Q 2 X 1
D 16 K 2 R 12 Y 1
E 10 L 8 S 28 Z 1
F 12 M 4 T 8
G 8 N 4 U 2
PE
PROGRAMMING CONSIDERATIONS.
___________________________
LS1
The early versions of SPELLCHECK which I concocted during
development were written solely in Basic. It soon became apparent
that the speed of interpreted Basic was inadequate for any except
the shortest text files. One version took nearly 40 minutes to
check one page of text! Rather than go the whole hog and convert
the whole thing to assembler, I identified the speed- and time-
critical parts of the program. These were related to the means of
storage of word lists in memory. Most of the working time of the
program was spent inserting words in the list in alphabetical
order, searching through the list for word matches and removing
correct words from the list after identification. The Basic
PROCedures which originally carried out these tasks were rewritten
in assembler. There were immediate and dramatic speed
improvements - of the order of 100 times. Whilst the speed of the
finished program may not compare with commercial products, it is
not dramatically worse and I have found it to be perfectly
acceptable in use.
Screen mode 7 was chosen for the program from the start because it
allows the maximum amount of RAM to remain free for use by the
program (important if the machine to be used has no shadow RAM!).
It is physically possible to combine all of the separate programs
which constitute SPELLCHECK into one, with the obvious advantages
of less files on disk, no repetition of common code and less disk
access. However, I decided to split the utilities away from the
main program, as RAM space is at a premium on the BBC Micro and
about 9K is used for dictionary, word list and other buffers. The
main program, SPELL, carries out the main functions, whereas
functions such as rebuilding dictionaries and dictionary
maintenance are called as separate programs. This makes sense, as
once the dictionary has been set up, these functions are only
infrequently used. This was also the rationale for making access
to the utilities through the absence of a name for a file to
check (just pressing RETURN), as the alternative would have been to
choose the main spellcheck activity from a menu each time the
program was run. By assuming that a file is to be checked at the
start, less 'setting up' is needed each time the program is used.
DF//End of text//