Allsorts of sorts
An Article
By H.S.Williams
Sept. 1991
Welcome to this article on sorting and the routines employed. If you'll excuse
the corny title, I'll explain the format of this article. The program that is no
w running is a caretaker program. It will load and display all the text files th
at form this article. The entire article comprises of a series of basic programs
, and accompanying text files. To get the best of the article, I suggest that yo
u read through the text files as presented by this program ( The title of each f
ile is displayed at the top of the screen as each file is displayed for your ref
erence. ) and then to go through the basic programs with reference to the text f
iles.
The text files were written using the Wordwise word processor and should easily
load into any other processor for printing. If you do not have a word processor
and you want to print out the articles then type in the following, ending each
line by pressing the return key.
MODE0
VDU2
*TYPE <filename>
The purpose of this article, or so Duncan tells me is to show the technical si
de of sorting. Also to provide the fastest sorting routines that I can, supposed
ly in machine code. The reality may be somewhat different. I'll start with an ou
tline of what I intend to cover.
An introduction is always a good start. The first thing to do will be to expla
in the basic methods behind some of the commonly used sorting routines; bubble,
index, shell, and quick. Taking the simplest and most inefficient of these ( the
bubble sort ), I'll show how by a sucession of simple modifications it can be m
ade more and more effective. The modification of the bubble sort will lead neatl
y onto the methods of comparison of sort routines. Also under which conditions d
ifferent routines are the most effective.
After this introduction I will present a sort routine of my own devising. This
will demonstrate how memory and speed are often traded off against one another,
incredible speed with many limitations.
If I get this far I'll then translate the more suitable of the routines so far
examined into machine code, this should provide you with some of the fastest so
rting routines available.
You may include any of the routines presented in this article in your own progr
ams. Examples in the form of basic programs and accompanying text files will be
presented along the way.
Program Format.
Rather than cloud the programs up with endless REM statements which only confu
se things, each demonstration program ( B.filename ) will also have a text file
associated with it ( T.filename ). The text file will explain all the modificati
ons. Also all the programs will use the same variable names to represent the sam
e functions. For example every program will initially fill an array ' A%(size%)
' with random numbers between 0 and max% . Here size% represents the size of th
e array minus one ( don't forget element A%(0) ), and max% represents the maximu
m value that an element within that array can take. All the variable names commo
n to two or more of the programs are listed below, along with an explanation of
their function.
In the following list n represents any integer ( whole number ) between two set
numbers ( usually 0 and maximum value. )
seed% This variable is used to set the random number generator to give the sam
e results every time. This is not to fiddle the results , but so that if I try t
o explain the results, you have to have the same results that I am explaining.
A%(n) Array containing random numbers between 0 and max%. size% is the size of
the array.
B%(n) Same size array as A%(n). Contains the position in the final sorted list
that an element in the array A%(n) occupies. E.G. If B%(3)=0, then A%(3) is at
the top of the sorted list.
O.K. so why is sorting so important, and why do people make so much fuss about
the subject ? The idea of sorting is simple; you start with a group of objects i
n a random order, and by applying a set of rules to the objects, the objects acq
uire predictable relationships with respect to each other. Long winded or what !
Put simply, you arrange the objects into a logical order. Since all computers d
eal exclusivly with numbers ( Even the words on this screen are just numbers in
the computer's memory. ) we will only consider numbers. So consider a group of n
umbers in any order, wouldn't it be usefull to know the largest, and smallest nu
mbers, and also how large any number is in relation to the others.
If we take our group of numbers then there are two obvious ways of organising
them; in ascending, and descending order of magnitude. Any other more complicate
d relationship is pointless.
Well I'm onto the fourth file and I haven't started sorting yet. If you're stil
l here, then you've got more staying power than I have! Now onto the first sorti
ng routine. ( Yaaaaaaayy! )
The simplest, most intuitive and in some ways the most important sorting routi
ne is called the BUBBLE sort. ( For reasons that will become obvious later. ) It
is important that you understand this routine because it is the most obvious, a
nd logical of all sorting routines.
Let's start with a small group of numbers in a random order. E.G.
3 9 6 7 1
Put simply the routine works by repeately comparing adjacent numbers, and excha
nging them if they are in the wrong order.
Firstly the top two numbers are compared, and if they are in the wrong order th
en they are swapped around. Then routine moves down the list by one number and t
he process is repeated. The process continues until the end of the list. The who
le process is then repeated over and over until the routine has been through the
entire list comparing adjacent numbers, and hasn't had to swap any pair aroud.
Say I want to arrange the above numbers into descending order using the bubble
sort. First thing to do is compare the top two numbers, the first is a 3 and the
second, a 9. Since I want the list in descending order, these two are in the wr
ong order and are swapped. The list now reads :
9 3 6 7 1
O.K. moving one down the list to the second number I have a 3 ( The one that ha
s just been moved down. ), and a 7. They're in the wrong order so are swapped. A
nd we have :
9 6 3 7 1
Moving down one again I have 3 and 7, again in the wrong order, so they are swa
pped to give :
9 6 7 3 1
Comparing the 3 and the 1, we find that for the first time that they are alread
y in the right order. Also we have just compared the last two numbers in the lis
t. Since we have swapped at least one pair of numbers during the routine, we hav
e to go through the same process again.
Starting with the first two numbers we have a 9 and a 6 in the right order, so
we move on. Next there are a 6 and a 7 in reverse order, so they are swapped, gi
ving :
9 7 6 3 1
There will be no more swaps during this cycle since all the numbers are in the
right order. But since SOME numbers were swapped around the routine is called ag
ain. On the next pass however, since the list is in the correct order none of th
e numbers are swapped around. The routine then recognises that the list isin or
der since it dosen't have to swap any numbers.
You may ask why the routine was called once more when we could see that the lis
t was obviously sorted by the end of the second pass. The reason for this is tha
t sorting routines usually work with much larger lists of numbers, and it is not
so obvious when the list becomes finally sorted during a pass. But if a pass of
the routine is completed without a single pair of numbers being swapped then it
MUST be in order.
The file B.BUBBLE on the disk is a basic program that sets up an array of rando
m numbers ( A%(size%) ) and then sorts them using a bubbe sort routine. The prog
ram prints out the numbers in their current order after each pass of the routine
, so that you can see how each pass affects the order of the numbers.
If you've reached this stage, and you don't understand the bubble sort routine,
or why it's called the bubble sort, then please go back and work it out. It may
not seem important to you, but if this is so then you don't understand it!
Still with me ? How about a natural sort routine. If I gave you a set of number
s, you'd look through the list until you found the smallest number, then you'd p
ut this number at the bottom of the list. Then you would look for the next smal
lest number, and place it just above the first smallest number. Makes sense dose
n't it ( I hope ) ? As far as computer talk goes, this as called a selection ( o
r exchange ) sort.
A selection sort looks through the group of numbers, finds the smallest, and th
en swaps it with the last number in the list. The size of the list is then reduc
ed by one to exclude the last number, and the routine moves onto the next smalle
st.
The BASIC program ( B.SELECT ) may not be as easy to understand as the B.BUBBLE
program, but it follows the way that people sort more accurately. As before the
arrangement of the numbers is shown after each pass of the sorting routine.
The SHELL sort.
The shell sort is the hardest of the the three basic sorting routines to under
stand. Don't worry if you have problems trying to make sense of it, just take yo
ur time.
The shell sort consists of a series of bubble sorts, performed on bigger and b
igger groups of numbers. To start, the entire list in divided into two halves, a
nd the first item in each half are compared and sorted. On the next pass of the
routine, the entire list is divided into four quaters, and the first item of eve
ry quarter is sorted using a bubble sort. Then the list is divided into eighths,
and the first item of every eighth bubble sorted. And so on until, on the final
pass the entire list is bubble sorted.
The file B.SHELL on the disk is, unsurprisingly a shell sort program. Like the
previous two example programs, the state of the partially sorted data is shown
after each pass of the shell sort routine. The state of the data is not however
shown during the 'sub' bubble sorts.
So far I've presented you with three different sorting methods, and also demon
stration programs in basic. Now it's time to see how the different routines comp
are against one another. The most obvious way of comparing performances is by me
asuring the time each routine takes to sort a group of numbers. As the size of t
he group of numbers to be sorted varies, different sorting methods become more e
ffective. The basic program ( B.COMPARE ) tests all three of the sorts so far en
countered against each other for different group sizes. To run it PAGE should be
set to &1200, if it is not the program automatically reloads to the correct add
ress. Once running, you will be prompted for firstly the maximun group size to b
e sorted between two and ten. This represents the maximum power of 2^n that is s
orted. If you don't understand this at first, just enter 6 or 7 or play with it
and it'll become obvious, I hope.
Next you must enter how many times that the program is sort groups of data. Th
e more times that the program is run, the more statistically accurate the result
s will be, just enter two or three if unsure. After that the program will switch
to mode 0 and the results will be displayed and updated as they are calculated.
Minimum, average, and maximum time taken to sort are given for each of the thre
e sort routines, and for each size of group sorted up to the one selected.
It takes about 72 minutes to bubble sort 1024 numbers so be prepared to wait i
f you choose a large sort size!
Well because I'm already behind in getting this to Duncan, I've decided to spi
lt this article into two parts. You've just reached the end of the first half. T
he next half, covering all the rest that I've promised will probably appear next
month if there's room. It'll give you a chance to work out and understand the
first half. The second half gets complicated, believe me, I'm the one who has to
write it.