Edit. Search and Replace.
~~~~~~~~~~~~~~~~~~~~~~~~~
If you own a Master 128 then you have access to Edit, one of the most
underrated programs for your machine. The Master 128 is well provided with
built-in software, and you may already be familiar with View, the word
processor, and Viewsheet its spreadsheet complement. View is an excellent word
processor, and if you are dealing with letters, reports and the like then it is
ideally suited to this task. However, there are many situations in which View
is less helpful, and some where the power and flexibility of Edit has no equal.
Edit is ideal for entering and editing programs, even programs in Basic. The
EDIT keyword will quickly transform the current Basic program in memory into a
format suitable for Edit, while Shift-f4 followed by 'B.' (for Basic) will
leave you with the edited program ready to run. Extensive editing is then much
easier as you no longer have to copy in its entirety any line being modified.
There is one disadvantage - Edit has no knowledge of Basic, so if you make any
mistake Edit will be unaware - and so will you until you try to run the
program.
For example, Edit allows you to alter line numbers, but it won't automatically
alter any other reference to the same line number. Changing line numbers can be
useful. If you want to change the order of two lines in a Basic program just
edit their line numbers to reflect correctly their desired positions, and when
you return to Basic the two lines will be correctly ordered as specified.
Edit can also supplement View. There is nothing to stop you loading a View
format file directly into Edit using function key f2 to select the Load option.
Some of the file may look a little weird because of the ASCII codes (or
characters) used by View to represent things like highlight markers, rulers,
etc. But have you ever tried searching a View file for, say, two spaces to
replace them by a single space? It won't work. In Edit this is no problem, and
thus provides an easy way to search for all occurrences of two spaces, and to
replace each found with a single space. There are other situations where Edit
supplements the facilities of View.
Another example of Edit's use is in converting a file from one format to
another, say Interword to View, or similar. For example, Wordwise represents
Tab by the code for the Tab character (ASCII 9) but with 128 added (i.e. ASCII
137). View uses the standard Tab character (Ctrl-I). If you use Wordwise's
spool option, then every Tab will be replaced by the corresponding number of
spaces, which is not really the best solution. Read the Wordwise file into
Edit, and you can easily replace Wordwise's Tab character with that used by
View.
Other similar problems arise - for example, View Professional terminates every
line with a space followed by a Carriage Return (ASCII13). Leaving the space in
can cause havoc when such a file is edited and reformatted within View. The
solution is to use Edit to search for each occurrence of <space><Return> and
replace by just <Return>. All this is trivial stuff as far as Edit is
concerned, and it is capable of a whole lot more. This month I propose to
examine the main features and characteristics of Edit, and then follow this
next month with a look at some of its more advanced (and more powerful)
features.
UNDERSTANDING EDIT
Edit treats every file simply as a sequence of bytes (as an ASCII file). Each
byte is represented on screen by a single character, with the non-printing
characters (ASCII codes 0 to 31) represented as inverse-video control
characters. For example, the Return character (ASCII code 13) already referred
to can also be entered from the keyboard by pressing Ctrl-M ( where the ASCII
code for 'M' is77, i.e.13 + 64). In Edit, the Return character appears as an
inverse-video 'M' (i.e. a black 'M' on a white background). Codes in the range
128 to 255 will be displayed as the characters of the Master's extended
character set, as listed in the Master's User Guide.
The important thing to remember is that Edit works with ASCII files ( or treats
all files as ASCII ). That's why you can't edit a Basic program by simply
loading it directly into Edit. A Basic program is tokenised, with each keyword
being replaced by a single token (or coded value). If loaded into Edit in this
form, each token would appear as a corresponding character. To edit a Basic
program properly it must be converted first from its more normal tokenised form
into a simple ASCII format. That's what the EDIT keyword does. The alternative
(which would work just as well, but which is more long winded) would be to
spool the Basic program out to a file (this process saves the program as an
ASCII text file), and then load this into Edit with the f2 function key
command.
Let's suppose you have loaded a Basic program into memory. Type EDIT (followed
by Return) and notice what happens. The screen should clear and be replaced by
the 'long' form of the Edit display (with help information at the head of the
screen), and a window below that in which the start of your program will
appear.
The help display at the head of the screen should show the meanings of the
function keys as used by Edit, and this serves as a useful reminder if you have
lost or forgotten the supplied key-strip. If this is not visible on screen,
press Shift-f5 (for SET MODE) , and in response to the prompt at the foot of
the screen enter 'D' . The screen display should then change to the format
described above. You can also use SET MODE to select the screen mode - I
usually use 131 (shadow mode 3). One useful feature of Edit is that it
remembers such settings from one time to the next, so decide what your
preferences are, and Edit will remember them.
USING EDIT
Once in Edit, with your program (or whatever) displayed, you can check out the
basic editing features. Many controls are very similar to View: Ctrl arrow up
and Ctrl arrow down to move to the start or end of the file, Shift arrow up and
Shift arrow down to move up or down one screenful, and so on. If you use the
cursor keys to scroll continuously up or down you will notice that you can
never reach the very top or very bottom of the Edit window (unless at the very
start or very end of the file respectively). Thus you can always see a few
lines both above and below the current cursor position.
I normally find that having the details of the function keys displayed at the
head of the screen more than compensates for the reduced size of the edit
window, but that is personal preference. Although cursor movement is very much
as for View there are some differences in the way in which things are
controlled. The Delete key operates as normal, deleting the character to the
left of the current cursor position, but in Edit it is the Copy key which
deletes the character at the cursor, not f9 as in View.
The other difference is that of dealing with blocks of characters. In View you
mark the start and end of a block. This is not always true in Edit. To delete a
block move to the start and press f6 (MARK PLACE) , then move to the end of the
block and press Shift-f8 (without placing another marker). To copy or move
blocks of characters, you mark both start and finish with f6 before pressing f7
(MARKED COPY) or Shift- f7 (MARKED MOVE).
Another useful key is Shift-f1 (INSERT/OVER) which toggles between insert mode
and overwrite mode. Again, whatever setting is current at the end of a session
is remembered by Edit for the next one. There is, unfortunately, no case change
key, so if you do need to change from upper to lower case, or vice versa, the
best solution is to select overwrite mode and carefully type over the top of
what is already there.
As far as loading and saving of files is concerned, I have already described
how a Basic program in memory can be transferred into Edit, and a return to
Basic made at the end. If you are dealing with ASCII files to start with, then
f2 and f3 will prompt for the name of a file to load or to save respectively.
Note the use of Shift-f2 to load and insert a file at the current cursor
position (setting a marker will not work, and will prevent the Insert command
from having effect).
Finally, before getting to the heart of Edit, note too the use of f1. This
gives a star prompt ready for any star command to be typed in, useful for
cataloguing a disc or whatever without leaving Edit. Escape will always cancel
command mode. You can also use command mode to leave Edit by typing Basic (or
just B.) in response to the prompt, but do remember that any Basic program you
have been editing (or any other file) will be immediately lost unless saved in
ASCII format first. Incidentally, one of the few irritating traits I have found
in Edit, is that using this method to exit from Edit leaves a screen window set
unless you subsequently change mode.
SEARCH AND REPLACE
Amongst the most powerful features of any editor are those to search for, and
optionally replace, any specified character string. And it is in this area that
Edit is particularly versatile and powerful.
Two function keys only do all the work, f4 which performs a selective search
and replace, and f5 which performs a global search and replace. Using f4, Edit
will search from the current cursor position for the first occurrence of the
specified string and pause when this has been found It will then prompt you to
continue (to search for the next occurrence) or replace (the target string with
a new string). The keys with which to respond are 'R' for replace and 'C' for
continue (not View's 'Y' Or 'N'). The selective search may or may not specify a
replacement string at the outset.
However, the global search and replace is only valid if both target and
replacement are specified, and all occurrences will be replaced (very quickly)
without any further response from the user. This makes the use of f5 very
powerful, and you should be cautious about using it unless you are quite sure
what you are doing. It can be fatally easy to change one thing for another
throughout a file only to find the wrong string has been replaced. And trying
to change back to the original state may be impossible.
For example, if you decide, when editing a program, to change the name of a
variable, do a search first to establish that you haven' t already used the new
name for something else. If you have, and you make all the changes you will
have used the same name for two different purposes, but at this stage you will
be quite unable to separate one from the other, except by individual
inspection.
It is often preferable, particularly with more complex search strings, to use
the selective search first to ensure that you have got it right, and only then
use the global search and replace.
In both cases, a replacement string is specified by following the search string
by a '/' and then the replacement string. Putting '/' followed by nothing
(except Return) will result in the target string being deleted (replaced by
'nothing').
Escape can be used to terminate a selective search prematurely, and once a
search has been made, a further press of f4 immediately followed by Return will
use the same search string as the last time.
I now intend to concentrate upon the search and replace facilities available.
These are initiated, by f4 (for a selective search / replace) and f5 for a
global version. Remember, if in doubt, test things first before trusting a
global search and replace with f5. That way you can abort an operation (just
press Escape) before much damage has been done.
Whichever out of f4 or f5 is used, in response to the ensuing prompt, the first
set of characters represents the search, followed by a slash character '/',
followed by the replacement string. So the format is:
<search string>/<replacement string>
One of the immense values of Edit is that you can search for (and replace) any
of the 256 characters in the range 0 to 255 (their ASCII codes). Given the
format above, that immediately poses two problems: how do you search for
non-printing characters or indeed any characters which cannot be found on the
standard keyboard, and if the '/' character is used to separate search and
replacement strings, can that character form part of either of these two
strings? There are several answers to these questions as we shall see.
SPECIAL CHARACTERS
In fact, a number of special characters, as well as '/' have particular
meanings in Edit, and as a result have to be represented differently as part of
a search or replacement string. There is also a complication, and that is that
the rules are somewhat different depending on whether we are searching for a
character, or using it as a replacement.
When searching, the following are all considered as special characters:
* ^ \ /@#.|$-~
with particular meanings in a search context. If you want to include any of
these characters in a search string, then it must be preceded by backslash
('\'). Thus to search, say, in a program for a variable title $, you would
enter as the search string:
title\$
Similarly, if you were searching for the expression 2.5*(a-b), this would have
to be specified as:
2\.5\*(a\-b)
Note that, because the backslash has this special meaning, the only way to
specify a genuine backslash is by preceding it by the relevant character, the
backslash itself. To search for a backslash you must therefore include:
\ \
SPECIFYING CONTROL CHARACTERS
Like the backslash character, each of the other special characters has a
specific meaning within Edit. For example, the '$' character is used to
indicate Return (ASCII 13). Because it occurs quite commonly, the Return
character is treated differently from other control characters (characters in
the range 1 to 31). All other control characters are specified by a sequence:
|<letter>
equivalent to Ctrl-<letter>. Thus the Tab character which has ASCII code 9 is
therefore Ctrl-I (because 'I' is the 9th letter of the alphabet) which would be
represented in a search string as:
| I
In most modes, such control characters appear on screen in an inverse video
format (normally black on white in contrast to the standard white on black).
We already have the ammunition to perform some useful search and replace
operations. For example, suppose we have a text file in which each paragraph
ends with two Return characters, and each new paragraph starts with a single
space. We want to ensure each paragraph starts with Tab. We can perform the
change with:
$$ /$$|I
i.e. replace <Return><Return><Space> with <Return><Return><Tab>. We need to
search for a double Return to ensure that we don' t pick out other situations
involving a single Return followed by a space.
TOP BIT SET CHARACTERS
As well as Control characters in the range 1 to 31, the other characters which
cannot be entered directly from the keyboard are those with ASCII codes in
excess of 128 (known as the top bit set characters because in a binary format
the top bit, of eight, is set to '1' ensuring the code for each such character
is 128 + n where 'n' is in the range 1 to 127). In Edit, these are specified
by.
| !<char>
where '|!' represents ASCII code 128, and the one or two character string,
<char> which follows represents a value which is added on to 128 to form the
final result. Thus the character with ASCII code 129 ('A') would be represented
as:
1! |A
because, as we have already seen, | A represents Ctrl-A with an ASCII code of
1. Adding this to 128 gives the required value of 129. Another example would
be:
|!a
which represents the character (ASCII code 225). In both examples, the text for
this article was read into Edit, and the sequences described above used to
insert the correct characters into this text (in place of dummy characters) -
View which I normally use cannot really cope with such requirements.
However, do not fall into the trap of assuming that the foregoing is only of
interest if a text file contains characters in the ASCII range 129 to 255. The
ability to specify these characters has other uses. For example, in Wordwise
(or Wordwise Plus) Tab is represented by adding 128 to the normal ASCII code
for Tab (9). Thus all Tabs are represented by ASCII 137. If you want to
transfer such a file for use in View or other word processors where Tab is
represented by ASCII 9 (Ctrl-I) then Edit can be used to make a global change.
The search and replace specification would be:
|!|I/|I
where '|!|I' represents ASCII 137 (128 + 9), and |I of course represents Tab
(Ctrl-I).
SPECIAL CHARACTERS IN REPLACEMENT STRINGS
When it comes to specifying a replacement string, there are fewer options and
hence fewer special characters used up by this process. Thus only the
characters:
% & | $
have to be represented by.
\% \& \| \$
This inconsistency between the use of special characters in search strings and
their use in replacement strings can readily cause confusion. For example, the
minus (or hyphen) is another special character, but only in search strings
Suppose you want to replace all occurrences of '-1' by '-2', the format would
be:
\-1/-2
The function of '-' as a special character and others will be dealt with later.
Once the '/' character has been encountered as the separator between the search
string and the replacement string, any further occurrences of '/' are treated
as part of the replacement string. Thus :
|I\\/|I/
would replace <Tab>\ by <Tab>/, in other words <Tab><backslash> by
<Tab><slash>.
USING SPECIAL CHARACTERS
The full list of special characters is shown in table 1 . Many of these allow
us to take a more flexible, and thus more powerful approach to the
specification of search strings.
Normally any search is not case-sensitive, but if a letter in a search string
is preceded by '\' then a match will be made only with a letter of the
specified case. For example, you might have a program which uses both the
pseudo Basic variable TIME and a variable of your own Time (not perhaps a good
idea anyway).
Suppose you decide to change your variable Time to Time% throughout the
program. If you specify (in response to f4 or f5):
Time/Time\%
then not only will Time be changed, but so will TIME (and any other combination
of upper and lower case characters). Note the backslash preceding '%' in the
replacement string because in this context '%' on its own is a special
character with another meaning (see next month). The backslash overrules this.
To achieve the correct result you will need:
\T\i\m\e/Time\%
Edit never attempts to preserve case, so the replacement string will always be
used as specified.
Other specifications which can be useful include '@' which will match any
single letter of the alphabet, and '#' which will match any single digit (range
0 to 9). For example, a search string of '####' would match all four digit
numbers in a program listing (including line numbers and other values). One
further specification for search strings which can be useful is to give a range
Thus:
A-F
would match any single character within the range , 'A' to 'F' inclusive,
while:
A-FA-F
would match two characters in the same range.
Table 1. Special characters used in Edit in search strings
Char. Meaning
* as few of as possible
^ as many of as possible
\ treat next character as is
/ search/replacement separator
@ any letter
# any digit
. any character
| next character is control character
$ Return character
- range of characters
~ negates a specification
SINGLE CHARACTER SPECIFIERS
In dealing with Edit's search and replace commands, we have concentrated on
single characters, or on sequences of single characters. For example:
fred/john
will replace all occurrences of fred by john (remember that a search is
initiated by pressing f4 for a selective search (and optionally replace), or
f5 for a global search and replace).
It is when we start to investigate the use of ambiguous character specifiers
and variable length strings that the true power and flexibility of Edit is
revealed. We have already seen that '@' can be used to match any alphabetic
character, and '#' any digit. In addition, '.' will match any character in the
range 0-255.
Ranges of characters can also be specified. For example:
A-Z
would search for any character in the range given, i.e upper case characters.
Searching for.
A-Z%
would search for any such letter followed by a '%' symbol, useful to locate all
occurrences of the built-in variables A% to Z% in a Basic program. Note the
format of the search syntax. You might have thought it should be:
A%-Z%
but this doesn't make sense (Edit would look for a string starting with a
letter 'A' or 'a' - as Edit is not case specific on single letters - followed
by a character in the range '%-Z' - ASCII codes 37 to 90 - followed by a '%').
Using Edit to the full, you need to develop the ability to analyse correctly
the syntax of a search string.
It is also possible to specify alternatives to be searched for, any one of
which is to be matched, Alternatives MUST be enclosed in square brackets. To
match any one of '1' , '3' , or 'c' the search string must be given as:
[13c]
and not as:
13c
Thus to search a program for A% , B% or C % we could specify.
[ABC]%
as the search string.
It is important to realise that when specifying a range of characters, or
alternative characters, each match will be with just a single character. in
effect we can term these single character elements in that they function like
any specific single character in a search string. Thus:
[A-Za-z]%
would match any two-character string consisting of an upper case or a lower
case character followed by a '%',i.e. it would match A% to Z% or a% to z%.
One further useful feature with single character elements is the negation
symbol '~'. Thus to search for a character in the range A to Z you would
specify:
A-Z
but to specify any character not in that range you would write:
~A-Z
MULTIPLE CHARACTER SPECIFIERS
Sometimes the string for which you are searching will vary in length, for
example the line numbers in a Basic program. Some will be two digits, some
three, some four or more. Suppose you want to search for all line numbers with
a space separating the line number from the following instruction (in order to
remove the space as I suggested last time). If we take two digit line numbers
only then we might specify:
##<space>
But then we have instructed Edit to search for all occurrences of two digits
followed by a space. This will also pick up the last two digits of all line
numbers of three digits or more, and probably many other numbers besides. First
of all lets see how we can construct the search to ensure we find all numbers
regardless of purpose or length, and then refine this to isolate just line
numbers and then just line numbers followed by a space.
To do this we need to introduce the function of two more special characters.
Preceding a character with a '^' produces a search for as many of that
character as possible. Thus specifying :
^#
will find strings of digits of any length. The other special character is '*'
which matches as few of the following character as possible (more of this in a
moment). To try to distinguish line numbers we can note that these appear
always at the start of a line, and are thus preceded by a carriage return
(represented by a '$'). So we can refine our search string to:
$^#
In fact in Edit shorter line numbers are preceded by one or more spaces so we
need a further refinement of the search string:
$*<space>^#
This can be interpreted as "Return, followed by as few spaces as possible,
followed by as many digits as possible" . It is interesting to consider too the
difference between that and the search specification:
$^<space>^ #
This would read as "Return, followed by as many spaces as possible, followed by
as many digits as possible", superficially perhaps not very different. Although
in a sense this is quite true, the difference can certainly be significant.
When searching for "as few as possible" of some character, the minimal match is
"nothing". Thus searching for as few spaces as possible will match no spaces,
one space, two spaces and so on. On the other hand, searching for "as many as
possible" needs to find at least one of the target characters to form a match.
Thus the first search specification given above would match line numbers such
as :
10
100
1000
10000
but the second would not match the last of these. In the same context, it is
important to specify "as many digits as possible" so that at least one digit is
required for a match.
This all takes longer to explain than it does to carry out, but even so the
distinction between the meanings of the two special characters '^' and ,'*' may
still confuse you as it still does me from time to time. If in doubt, as with
all searches, try just the search only (no replace), and make it a selective
rather than a global search. Once you are confident you have constructed the
correct syntax, then trust yourself to a global search and replace.
Despite all this discussion we have still not completed the task we set
ourselves. To search reasonably for all line numbers followed by a space you
will need to specify:
$*<space>^#<space>
Thats fine but how do we now set about removing that last space. In effect
what we need to do is to replace the string found by the same string less it's
final character the space. This can be achieved by looking at two further
special characters, '&' and '%' this time in the replacement specification. The
'&' character represents the complete target string. Thus:
$ *<space>^#<space>/&
would certainly find line numbers followed by spaces, but would replace the
target by itself, not very useful, though in other contexts the '&' character
certainly has a part to play. For example, if we are trying to achieve the
opposite of our original task, to insert a space after every line number we
could write:
$ *<space>^# / &<space>
where the target string is replaced by the same string followed by a space (but
note that this specification would apply that to all line numbers including
those already followed by a space - for a refinement on this see later).
Now to return to our original task we need to use the other special character
'%' to select just part of the target string. The '%' character, in this
context, is followed by a single digit. This is used to identify the different
ambiguous elements of the search string - after all, unambiguous elements are
known and can simply be repeated. In our example the first ambiguous part is:
*<space>
and this is represented by %0 in the replacement string. The second (in our
case) ambiguous element is:
^ #
and this would be represented by %0. Any further ambiguous elements would be
represented by %2, %3 and so on. Thus the replace string will consist of Return
(indicated by a '$'), followed by %0 followed by %1 (but not including the
space), thus:
$*<space>^ #<space>/ $ %0%1
At this stage, I strongly advise you to try this out for yourself by
constructing a short if artificial example. By the way, if you want to insert a
space following all line numbers where such a space is missing use:
$ *<space>^#~<space>/ $%0%1<space>%2
Here, we search for line numbers followed by any character other than a 'space'
(i.e. not a space), but the replacement, as well as including the extra space
needed must also put back the last character of the target string (using %2) as
this will have been the first character after the line number (in other words,
the start of an instruction).
The final examples which we have constructed have become quite complex, but are
in fact quite useful in practice for tidying up programs. They will either
insert a space, following a line number, where none existed before, or remove a
space following a line number where one existed previously. Think how you might
modify that one to remove not just a single following space, but as many spaces
as there might happen to be, very useful for a program saved in a LISTO format.