EDIT Search and Replace

Edit. Search and Replace. ~~~~~~~~~~~~~~~~~~~~~~~~~ If you own a Master 128 then you have access to Edit, one of the most underrated programs for your machine. The Master 128 is well provided with built-in software, and you may already be familiar with View, the word processor, and Viewsheet its spreadsheet complement. View is an excellent word processor, and if you are dealing with letters, reports and the like then it is ideally suited to this task. However, there are many situations in which View is less helpful, and some where the power and flexibility of Edit has no equal. Edit is ideal for entering and editing programs, even programs in Basic. The EDIT keyword will quickly transform the current Basic program in memory into a format suitable for Edit, while Shift-f4 followed by 'B.' (for Basic) will leave you with the edited program ready to run. Extensive editing is then much easier as you no longer have to copy in its entirety any line being modified. There is one disadvantage - Edit has no knowledge of Basic, so if you make any mistake Edit will be unaware - and so will you until you try to run the program. For example, Edit allows you to alter line numbers, but it won't automatically alter any other reference to the same line number. Changing line numbers can be useful. If you want to change the order of two lines in a Basic program just edit their line numbers to reflect correctly their desired positions, and when you return to Basic the two lines will be correctly ordered as specified. Edit can also supplement View. There is nothing to stop you loading a View format file directly into Edit using function key f2 to select the Load option. Some of the file may look a little weird because of the ASCII codes (or characters) used by View to represent things like highlight markers, rulers, etc. But have you ever tried searching a View file for, say, two spaces to replace them by a single space? It won't work. In Edit this is no problem, and thus provides an easy way to search for all occurrences of two spaces, and to replace each found with a single space. There are other situations where Edit supplements the facilities of View. Another example of Edit's use is in converting a file from one format to another, say Interword to View, or similar. For example, Wordwise represents Tab by the code for the Tab character (ASCII 9) but with 128 added (i.e. ASCII 137). View uses the standard Tab character (Ctrl-I). If you use Wordwise's spool option, then every Tab will be replaced by the corresponding number of spaces, which is not really the best solution. Read the Wordwise file into Edit, and you can easily replace Wordwise's Tab character with that used by View. Other similar problems arise - for example, View Professional terminates every line with a space followed by a Carriage Return (ASCII13). Leaving the space in can cause havoc when such a file is edited and reformatted within View. The solution is to use Edit to search for each occurrence of <space><Return> and replace by just <Return>. All this is trivial stuff as far as Edit is concerned, and it is capable of a whole lot more. This month I propose to examine the main features and characteristics of Edit, and then follow this next month with a look at some of its more advanced (and more powerful) features. UNDERSTANDING EDIT Edit treats every file simply as a sequence of bytes (as an ASCII file). Each byte is represented on screen by a single character, with the non-printing characters (ASCII codes 0 to 31) represented as inverse-video control characters. For example, the Return character (ASCII code 13) already referred to can also be entered from the keyboard by pressing Ctrl-M ( where the ASCII code for 'M' is77, i.e.13 + 64). In Edit, the Return character appears as an inverse-video 'M' (i.e. a black 'M' on a white background). Codes in the range 128 to 255 will be displayed as the characters of the Master's extended character set, as listed in the Master's User Guide. The important thing to remember is that Edit works with ASCII files ( or treats all files as ASCII ). That's why you can't edit a Basic program by simply loading it directly into Edit. A Basic program is tokenised, with each keyword being replaced by a single token (or coded value). If loaded into Edit in this form, each token would appear as a corresponding character. To edit a Basic program properly it must be converted first from its more normal tokenised form into a simple ASCII format. That's what the EDIT keyword does. The alternative (which would work just as well, but which is more long winded) would be to spool the Basic program out to a file (this process saves the program as an ASCII text file), and then load this into Edit with the f2 function key command. Let's suppose you have loaded a Basic program into memory. Type EDIT (followed by Return) and notice what happens. The screen should clear and be replaced by the 'long' form of the Edit display (with help information at the head of the screen), and a window below that in which the start of your program will appear. The help display at the head of the screen should show the meanings of the function keys as used by Edit, and this serves as a useful reminder if you have lost or forgotten the supplied key-strip. If this is not visible on screen, press Shift-f5 (for SET MODE) , and in response to the prompt at the foot of the screen enter 'D' . The screen display should then change to the format described above. You can also use SET MODE to select the screen mode - I usually use 131 (shadow mode 3). One useful feature of Edit is that it remembers such settings from one time to the next, so decide what your preferences are, and Edit will remember them. USING EDIT Once in Edit, with your program (or whatever) displayed, you can check out the basic editing features. Many controls are very similar to View: Ctrl arrow up and Ctrl arrow down to move to the start or end of the file, Shift arrow up and Shift arrow down to move up or down one screenful, and so on. If you use the cursor keys to scroll continuously up or down you will notice that you can never reach the very top or very bottom of the Edit window (unless at the very start or very end of the file respectively). Thus you can always see a few lines both above and below the current cursor position. I normally find that having the details of the function keys displayed at the head of the screen more than compensates for the reduced size of the edit window, but that is personal preference. Although cursor movement is very much as for View there are some differences in the way in which things are controlled. The Delete key operates as normal, deleting the character to the left of the current cursor position, but in Edit it is the Copy key which deletes the character at the cursor, not f9 as in View. The other difference is that of dealing with blocks of characters. In View you mark the start and end of a block. This is not always true in Edit. To delete a block move to the start and press f6 (MARK PLACE) , then move to the end of the block and press Shift-f8 (without placing another marker). To copy or move blocks of characters, you mark both start and finish with f6 before pressing f7 (MARKED COPY) or Shift- f7 (MARKED MOVE). Another useful key is Shift-f1 (INSERT/OVER) which toggles between insert mode and overwrite mode. Again, whatever setting is current at the end of a session is remembered by Edit for the next one. There is, unfortunately, no case change key, so if you do need to change from upper to lower case, or vice versa, the best solution is to select overwrite mode and carefully type over the top of what is already there. As far as loading and saving of files is concerned, I have already described how a Basic program in memory can be transferred into Edit, and a return to Basic made at the end. If you are dealing with ASCII files to start with, then f2 and f3 will prompt for the name of a file to load or to save respectively. Note the use of Shift-f2 to load and insert a file at the current cursor position (setting a marker will not work, and will prevent the Insert command from having effect). Finally, before getting to the heart of Edit, note too the use of f1. This gives a star prompt ready for any star command to be typed in, useful for cataloguing a disc or whatever without leaving Edit. Escape will always cancel command mode. You can also use command mode to leave Edit by typing Basic (or just B.) in response to the prompt, but do remember that any Basic program you have been editing (or any other file) will be immediately lost unless saved in ASCII format first. Incidentally, one of the few irritating traits I have found in Edit, is that using this method to exit from Edit leaves a screen window set unless you subsequently change mode. SEARCH AND REPLACE Amongst the most powerful features of any editor are those to search for, and optionally replace, any specified character string. And it is in this area that Edit is particularly versatile and powerful. Two function keys only do all the work, f4 which performs a selective search and replace, and f5 which performs a global search and replace. Using f4, Edit will search from the current cursor position for the first occurrence of the specified string and pause when this has been found It will then prompt you to continue (to search for the next occurrence) or replace (the target string with a new string). The keys with which to respond are 'R' for replace and 'C' for continue (not View's 'Y' Or 'N'). The selective search may or may not specify a replacement string at the outset. However, the global search and replace is only valid if both target and replacement are specified, and all occurrences will be replaced (very quickly) without any further response from the user. This makes the use of f5 very powerful, and you should be cautious about using it unless you are quite sure what you are doing. It can be fatally easy to change one thing for another throughout a file only to find the wrong string has been replaced. And trying to change back to the original state may be impossible. For example, if you decide, when editing a program, to change the name of a variable, do a search first to establish that you haven' t already used the new name for something else. If you have, and you make all the changes you will have used the same name for two different purposes, but at this stage you will be quite unable to separate one from the other, except by individual inspection. It is often preferable, particularly with more complex search strings, to use the selective search first to ensure that you have got it right, and only then use the global search and replace. In both cases, a replacement string is specified by following the search string by a '/' and then the replacement string. Putting '/' followed by nothing (except Return) will result in the target string being deleted (replaced by 'nothing'). Escape can be used to terminate a selective search prematurely, and once a search has been made, a further press of f4 immediately followed by Return will use the same search string as the last time. I now intend to concentrate upon the search and replace facilities available. These are initiated, by f4 (for a selective search / replace) and f5 for a global version. Remember, if in doubt, test things first before trusting a global search and replace with f5. That way you can abort an operation (just press Escape) before much damage has been done. Whichever out of f4 or f5 is used, in response to the ensuing prompt, the first set of characters represents the search, followed by a slash character '/', followed by the replacement string. So the format is: <search string>/<replacement string> One of the immense values of Edit is that you can search for (and replace) any of the 256 characters in the range 0 to 255 (their ASCII codes). Given the format above, that immediately poses two problems: how do you search for non-printing characters or indeed any characters which cannot be found on the standard keyboard, and if the '/' character is used to separate search and replacement strings, can that character form part of either of these two strings? There are several answers to these questions as we shall see. SPECIAL CHARACTERS In fact, a number of special characters, as well as '/' have particular meanings in Edit, and as a result have to be represented differently as part of a search or replacement string. There is also a complication, and that is that the rules are somewhat different depending on whether we are searching for a character, or using it as a replacement. When searching, the following are all considered as special characters: * ^ \ /@#.|$-~ with particular meanings in a search context. If you want to include any of these characters in a search string, then it must be preceded by backslash ('\'). Thus to search, say, in a program for a variable title $, you would enter as the search string: title\$ Similarly, if you were searching for the expression 2.5*(a-b), this would have to be specified as: 2\.5\*(a\-b) Note that, because the backslash has this special meaning, the only way to specify a genuine backslash is by preceding it by the relevant character, the backslash itself. To search for a backslash you must therefore include: \ \ SPECIFYING CONTROL CHARACTERS Like the backslash character, each of the other special characters has a specific meaning within Edit. For example, the '$' character is used to indicate Return (ASCII 13). Because it occurs quite commonly, the Return character is treated differently from other control characters (characters in the range 1 to 31). All other control characters are specified by a sequence: |<letter> equivalent to Ctrl-<letter>. Thus the Tab character which has ASCII code 9 is therefore Ctrl-I (because 'I' is the 9th letter of the alphabet) which would be represented in a search string as: | I In most modes, such control characters appear on screen in an inverse video format (normally black on white in contrast to the standard white on black). We already have the ammunition to perform some useful search and replace operations. For example, suppose we have a text file in which each paragraph ends with two Return characters, and each new paragraph starts with a single space. We want to ensure each paragraph starts with Tab. We can perform the change with: $$ /$$|I i.e. replace <Return><Return><Space> with <Return><Return><Tab>. We need to search for a double Return to ensure that we don' t pick out other situations involving a single Return followed by a space. TOP BIT SET CHARACTERS As well as Control characters in the range 1 to 31, the other characters which cannot be entered directly from the keyboard are those with ASCII codes in excess of 128 (known as the top bit set characters because in a binary format the top bit, of eight, is set to '1' ensuring the code for each such character is 128 + n where 'n' is in the range 1 to 127). In Edit, these are specified by. | !<char> where '|!' represents ASCII code 128, and the one or two character string, <char> which follows represents a value which is added on to 128 to form the final result. Thus the character with ASCII code 129 ('A') would be represented as: 1! |A because, as we have already seen, | A represents Ctrl-A with an ASCII code of 1. Adding this to 128 gives the required value of 129. Another example would be: |!a which represents the character (ASCII code 225). In both examples, the text for this article was read into Edit, and the sequences described above used to insert the correct characters into this text (in place of dummy characters) - View which I normally use cannot really cope with such requirements. However, do not fall into the trap of assuming that the foregoing is only of interest if a text file contains characters in the ASCII range 129 to 255. The ability to specify these characters has other uses. For example, in Wordwise (or Wordwise Plus) Tab is represented by adding 128 to the normal ASCII code for Tab (9). Thus all Tabs are represented by ASCII 137. If you want to transfer such a file for use in View or other word processors where Tab is represented by ASCII 9 (Ctrl-I) then Edit can be used to make a global change. The search and replace specification would be: |!|I/|I where '|!|I' represents ASCII 137 (128 + 9), and |I of course represents Tab (Ctrl-I). SPECIAL CHARACTERS IN REPLACEMENT STRINGS When it comes to specifying a replacement string, there are fewer options and hence fewer special characters used up by this process. Thus only the characters: % & | $ have to be represented by. \% \& \| \$ This inconsistency between the use of special characters in search strings and their use in replacement strings can readily cause confusion. For example, the minus (or hyphen) is another special character, but only in search strings Suppose you want to replace all occurrences of '-1' by '-2', the format would be: \-1/-2 The function of '-' as a special character and others will be dealt with later. Once the '/' character has been encountered as the separator between the search string and the replacement string, any further occurrences of '/' are treated as part of the replacement string. Thus : |I\\/|I/ would replace <Tab>\ by <Tab>/, in other words <Tab><backslash> by <Tab><slash>. USING SPECIAL CHARACTERS The full list of special characters is shown in table 1 . Many of these allow us to take a more flexible, and thus more powerful approach to the specification of search strings. Normally any search is not case-sensitive, but if a letter in a search string is preceded by '\' then a match will be made only with a letter of the specified case. For example, you might have a program which uses both the pseudo Basic variable TIME and a variable of your own Time (not perhaps a good idea anyway). Suppose you decide to change your variable Time to Time% throughout the program. If you specify (in response to f4 or f5): Time/Time\% then not only will Time be changed, but so will TIME (and any other combination of upper and lower case characters). Note the backslash preceding '%' in the replacement string because in this context '%' on its own is a special character with another meaning (see next month). The backslash overrules this. To achieve the correct result you will need: \T\i\m\e/Time\% Edit never attempts to preserve case, so the replacement string will always be used as specified. Other specifications which can be useful include '@' which will match any single letter of the alphabet, and '#' which will match any single digit (range 0 to 9). For example, a search string of '####' would match all four digit numbers in a program listing (including line numbers and other values). One further specification for search strings which can be useful is to give a range Thus: A-F would match any single character within the range , 'A' to 'F' inclusive, while: A-FA-F would match two characters in the same range. Table 1. Special characters used in Edit in search strings Char. Meaning * as few of as possible ^ as many of as possible \ treat next character as is / search/replacement separator @ any letter # any digit . any character | next character is control character $ Return character - range of characters ~ negates a specification SINGLE CHARACTER SPECIFIERS In dealing with Edit's search and replace commands, we have concentrated on single characters, or on sequences of single characters. For example: fred/john will replace all occurrences of fred by john (remember that a search is initiated by pressing f4 for a selective search (and optionally replace), or f5 for a global search and replace). It is when we start to investigate the use of ambiguous character specifiers and variable length strings that the true power and flexibility of Edit is revealed. We have already seen that '@' can be used to match any alphabetic character, and '#' any digit. In addition, '.' will match any character in the range 0-255. Ranges of characters can also be specified. For example: A-Z would search for any character in the range given, i.e upper case characters. Searching for. A-Z% would search for any such letter followed by a '%' symbol, useful to locate all occurrences of the built-in variables A% to Z% in a Basic program. Note the format of the search syntax. You might have thought it should be: A%-Z% but this doesn't make sense (Edit would look for a string starting with a letter 'A' or 'a' - as Edit is not case specific on single letters - followed by a character in the range '%-Z' - ASCII codes 37 to 90 - followed by a '%'). Using Edit to the full, you need to develop the ability to analyse correctly the syntax of a search string. It is also possible to specify alternatives to be searched for, any one of which is to be matched, Alternatives MUST be enclosed in square brackets. To match any one of '1' , '3' , or 'c' the search string must be given as: [13c] and not as: 13c Thus to search a program for A% , B% or C % we could specify. [ABC]% as the search string. It is important to realise that when specifying a range of characters, or alternative characters, each match will be with just a single character. in effect we can term these single character elements in that they function like any specific single character in a search string. Thus: [A-Za-z]% would match any two-character string consisting of an upper case or a lower case character followed by a '%',i.e. it would match A% to Z% or a% to z%. One further useful feature with single character elements is the negation symbol '~'. Thus to search for a character in the range A to Z you would specify: A-Z but to specify any character not in that range you would write: ~A-Z MULTIPLE CHARACTER SPECIFIERS Sometimes the string for which you are searching will vary in length, for example the line numbers in a Basic program. Some will be two digits, some three, some four or more. Suppose you want to search for all line numbers with a space separating the line number from the following instruction (in order to remove the space as I suggested last time). If we take two digit line numbers only then we might specify: ##<space> But then we have instructed Edit to search for all occurrences of two digits followed by a space. This will also pick up the last two digits of all line numbers of three digits or more, and probably many other numbers besides. First of all lets see how we can construct the search to ensure we find all numbers regardless of purpose or length, and then refine this to isolate just line numbers and then just line numbers followed by a space. To do this we need to introduce the function of two more special characters. Preceding a character with a '^' produces a search for as many of that character as possible. Thus specifying : ^# will find strings of digits of any length. The other special character is '*' which matches as few of the following character as possible (more of this in a moment). To try to distinguish line numbers we can note that these appear always at the start of a line, and are thus preceded by a carriage return (represented by a '$'). So we can refine our search string to: $^# In fact in Edit shorter line numbers are preceded by one or more spaces so we need a further refinement of the search string: $*<space>^# This can be interpreted as "Return, followed by as few spaces as possible, followed by as many digits as possible" . It is interesting to consider too the difference between that and the search specification: $^<space>^ # This would read as "Return, followed by as many spaces as possible, followed by as many digits as possible", superficially perhaps not very different. Although in a sense this is quite true, the difference can certainly be significant. When searching for "as few as possible" of some character, the minimal match is "nothing". Thus searching for as few spaces as possible will match no spaces, one space, two spaces and so on. On the other hand, searching for "as many as possible" needs to find at least one of the target characters to form a match. Thus the first search specification given above would match line numbers such as : 10 100 1000 10000 but the second would not match the last of these. In the same context, it is important to specify "as many digits as possible" so that at least one digit is required for a match. This all takes longer to explain than it does to carry out, but even so the distinction between the meanings of the two special characters '^' and ,'*' may still confuse you as it still does me from time to time. If in doubt, as with all searches, try just the search only (no replace), and make it a selective rather than a global search. Once you are confident you have constructed the correct syntax, then trust yourself to a global search and replace. Despite all this discussion we have still not completed the task we set ourselves. To search reasonably for all line numbers followed by a space you will need to specify: $*<space>^#<space> Thats fine but how do we now set about removing that last space. In effect what we need to do is to replace the string found by the same string less it's final character the space. This can be achieved by looking at two further special characters, '&' and '%' this time in the replacement specification. The '&' character represents the complete target string. Thus: $ *<space>^#<space>/& would certainly find line numbers followed by spaces, but would replace the target by itself, not very useful, though in other contexts the '&' character certainly has a part to play. For example, if we are trying to achieve the opposite of our original task, to insert a space after every line number we could write: $ *<space>^# / &<space> where the target string is replaced by the same string followed by a space (but note that this specification would apply that to all line numbers including those already followed by a space - for a refinement on this see later). Now to return to our original task we need to use the other special character '%' to select just part of the target string. The '%' character, in this context, is followed by a single digit. This is used to identify the different ambiguous elements of the search string - after all, unambiguous elements are known and can simply be repeated. In our example the first ambiguous part is: *<space> and this is represented by %0 in the replacement string. The second (in our case) ambiguous element is: ^ # and this would be represented by %0. Any further ambiguous elements would be represented by %2, %3 and so on. Thus the replace string will consist of Return (indicated by a '$'), followed by %0 followed by %1 (but not including the space), thus: $*<space>^ #<space>/ $ %0%1 At this stage, I strongly advise you to try this out for yourself by constructing a short if artificial example. By the way, if you want to insert a space following all line numbers where such a space is missing use: $ *<space>^#~<space>/ $%0%1<space>%2 Here, we search for line numbers followed by any character other than a 'space' (i.e. not a space), but the replacement, as well as including the extra space needed must also put back the last character of the target string (using %2) as this will have been the first character after the line number (in other words, the start of an instruction). The final examples which we have constructed have become quite complex, but are in fact quite useful in practice for tidying up programs. They will either insert a space, following a line number, where none existed before, or remove a space following a line number where one existed previously. Think how you might modify that one to remove not just a single following space, but as many spaces as there might happen to be, very useful for a program saved in a LISTO format.