Data Structures and Algorithms

**Please contact
me to suggest any changes to this page. Most of the code on this page is in
Pascal. There are quite a few freeware Pascal compilers available, try softseek.com to find
one. I have had some trouble with some of them, Free Pascal and
Bloodshed Pascal in particular. I use Borland's Turbo Pascal 7 which I think is
fantastic even if it looks like it was made in 1963. It is also easy to implement the code in C. I recommend
the LCC-Win32
compiler,
a windows version can be downloaded here
(2.29 MB). Also remember that most C++ compilers compile C code. **

The following are the major data structures and algorithms used in computer Programming. Although the examples are given in Pascal, the same fundamental data structures are found in most procedural and Object Orientated languages such as JAVA, C and C++. It is very easy to convert the code below to C and C++ in particular.

This is not the most interesting area of Computer Science, but unfortunately it is necessary (or so they tell me). I have found that the best way to learn this stuff is to write programs. It is often a hit and miss process, but just keep adapting your code until the compiler accepts it. It also isn't a bad idea to go through the code line by line and write down or draw a diagram of what is happening at each line.

This page focuses more on the implementation of data structures and their algorithms than the theory behind them. For information on theory you had better find a big book and an even bigger pot of coffee.

The linked list section provides a lot of the basic knowledge necessary to understand many of the others. It presupposes a knowledge of the basics of Pascal including arrays, records and pointers.

Some of this code could have small errors in it since I can't cut and paste between windows programs and my Pascal environment. Any errors should be easy to find - they probably involve semi colons.

__Linked
Lists__

This section presents a general introduction to linked lists. It demonstrates how to construct a list, how to add new nodes to the front, back or middle, how to print the list, how to search the list, and how to delete nodes. This section provides the basic knowledge you need to understand most other sections on this page. More advanced topics are covered in the following sections: Queues, Stacks, Trees, Graphs etc.

Linked lists are like arrays in many ways, except they have the added advantage of being able to grow and shrink. With arrays you need to declare how many elements it has at compile time (some languages do allow arrays to grow, notably VB). A linked list is dynamic, it grows as each node is added, and shrinks as each node is deleted. It is also much easier to add a node to the middle of a linked list than the middle of an array.

The following is the
declarations needed to create a linked
list where each node in the list holds an element called **number**, and a
pointer to the next node called **next**.

**type**

**dataPtrType = ^dataRecType;
**{Note that we can declare a type of dataRecType before declaring dataRecType
: the declaration is on the lines below.}

**dataRecType =
record **{This is the record that determines the structure of the nodes}

**number : integer;**

**next : dataPtrType; **{Next
is of dataPtrType ie. it is a pointer to dataRecType - the record type it is
declared in.}

**end;**

**var**

**head, newPtr :
dataPtrType; **{We now declare two pointers to dataRecType.)

**response : integer;**

This is the code that
creates the list. Note that it has an if-then-else clause that tests to see if
the list has already been started. The top half is only executed the first time
through the loop (**if head = nil**).

**begin**

**head := nil; **{It is
always a good idea to initialize pointers to equal nil before assigning them a
memory address.}

**writeln('Enter a
number');**

**readln(response);**

**while response <>
0 do**

**
begin**

** ****
new(newptr); **{We create an area in memory for the new node - newPtr points
to this area.}

**newPtr^.number := response;** {The thing newPtr points to is the record
dataRecType - the value number in this record takes the value of response.}

**
if head = nil then**

** ****
newPtr^.next := nil **{This happens the first time through the loop. This node
will always be the last in the list (other nodes are created before it rather
than after it) so it is very important that the next points to nil, this ensures
we know when we have come to the end of the list.}

**
else**

**
newPtr^.next := head; **{New
nodes are put at the front - then next in them points to the node that
previously was the head.}

** ****
head := newPtr; **{This new element now becomes the head of the list}

**
writeln('Enter a number'); **{We
then begin the loop again}

**
readln(response);**

** end;**

**end.**

We can now write a procedure to print out this list.

**procedure print(head :
dataPtrType); ** {It receives the pointer to the head of the list}

**var**

**tempPtr : dataPtrType;** {It
creates its own pointer to dataPtrType so that the value in head can be stored
in it.}

**begin**

**if head = nil then**

**
writeln('Nothing in the List') ** {This is in case the list is empty.}

**else**

** begin**

**
tempPtr := head;**

**
while tempPtr <> nil do **{We
move along until we get to the last node which was set to point to nil}

**
begin**

**
writeln(tempPtr^.number);**

**
tempPtr := tempPtr^.next; **{This
is how you move to the next element in the list.}

**
end**

** end;**

**end;**

The preceding code has created a linked list and printed it out. There are a number of improvements that can be made to it however. It is more efficient to create a function that initializes the list by creating a dummy head node. This removes the need for the if-then-else clause when we add nodes.

The function looks like this:

**function initializelist
: dataPtrType;**

**var**

** temp
: dataPtrType;**

**begin**

**
new(temp);**

**temp^.number := -10000;
**{This is a junk value, but it is best to initialize it nonetheless. Make it
a value that you would never put into the list normally.}

**temp^.next := nil;**

**initializelist := temp;**

**end;**

At the start of the mainline call the function initializelist like this:

**begin**

**head: = initializelist;**

**tail := head;**

Now nodes can be added to
the list without checking is **head = nil**.

Note that there is also
going to be a tail as well as a head. The ** tail** always points to the last node in
the list just as the **head** always points to the first. This will help us
add nodes to the end of the list as well as the beginning.

The following procedure adds a node to the end of the list:

**procedure addToEnd (resp
: integer; var tailPtr : dataPtrType); **{The argument list accepts the number
the user wants to put into the list, plus a pointer to **tail**. Note that this is a **var**
parameter, the value of** tail **will change to the new node we create.}

**var **

** temp
: dataPtrType;**

**begin**

**
new(temp);**

**temp^.number := resp;**

**temp^.next := nil; **{**temp**
will be the last element in the list, so it must point to nil.}

**tailPtr^.next := temp; **{The
previous **tail** of the list now points to **temp** : the new last
element in the list.}

**tailPtr := temp; **{**temp**
becomes the new **tail** of the list}

**end;**

We now can add nodes with the following code:

**writeln('Enter a
number');**

**readln(response);**

**while (response > 0)
do**

**
begin**

**
addToEnd(response, tail);**

**
write('Enter a number');**

**
readln(response);**

** end;**

**end.**

Another important utility
for linked lists is a search procedure. This asks the user for something to search
for (a number in this case), and begins a search from the head node. If it finds the
node it stops searching and returns a Boolean variable with the value true along
with a pointer to the node. If it gets to the end of the list without finding
the node it returns false. Naturally, it accepts a pointer to **head**, but
this is under the guise of a pointer named **location**, because the
procedure is going to change the value of this pointer, and we don't want to
change the value of **head**.

**procedure search(target :
integer; var ptr : dataPtrType; var found : boolean);**

**begin**

**
found := false;**

** ****
ptr := ptr^.next; **{Remember their is a dummy head node that we just skip.}

** ****
while (not found) and (ptr <> nil) do **{i.e. while the node hasn't been
found, and we haven't reached the end of the list, keep searching.}

**
begin**

**
if ptr^.number = target then**

**
found := true ** {It breaks out of the loop at this time, returning the value
of true for found, and also a pointer to the node it was found in (remember, **ptr**
was a **var **parameter.)}

**
else**

**
ptr := ptr^.next;**

**
end;**

**end;**

This procedure can be called as follows:

**var **{These
are the variables the mainline needs to call the procedure.}

**num : integer;**

**location : dataPtrType;**

**found : boolean;**

**begin**

**write('Enter a number
to search for');**

**readln(num);**

**location := head; **{We
want to start searching from the top of the list. Remember we don't send **head**
directly, because we don't want the value of **head** altered.}

**search(num, location,
found);**

**if found then**

**
writeln(location^.number); **{**location**
was accepted as a **var** parameter, so if it was found, location is set to
point to the node in which it was found.}

**else**

**
writeln('Number not found');**

**end.**

Another important
procedure is a delete procedure for removing nodes from linked lists. For this
we need to search for the node we want to delete. This is not enough however, we
also need a pointer to the node before the node we want to delete because we
need to change its next value to point to the pointer after the one to be
deleted. Once we got to the pointer to be deleted there would be no way to move
back one place, so instead we move through the list with two pointers. Not only
do we have a pointer looking for the node to be deleted (**deletePtr**), we
have a pointer always one place behind called **trailing**. Once the pointer
to be deleted is found we set trailing to point to **trailing^.next**.

**procedure delete (headPtr
: dataPtrType); **{This procedure only accepts a pointer to **head** in its
argument list.}

**var**

**
unwantedNum : integer;**

**
trailingPtr, deletePtr : dataPtrType;**

**
found : boolean;**

**begin**

**
writeln('Enter the unwanted number');**

**
readln(unwantedNum);**

**
found := false;**

**
trailingPtr := headPtr;**

** ****
deletePtr := headPtr^.next; **{It doesn't matter that we won't search the head
node for the number, because it was a dummy head node.}

**
while (not found) and (deletePtr := nil)**

**
begin**

**
if (deletePtr^.number = unwantedNum) then**

**
begin**

** ****
trailPtr^.next := deletePtr^.next; **{Nothing now points to **deletePtr**,
and the list is still joined.}

**
dispose(deletePtr);**

**
found := true;**

**
end**

**
else**

**
begin**

**
trailPtr := deletePtr; **{Move
both pointers one step along the list}

**
actualPtr := deletePtr^.next;**

**
end;**

**
end;**

**end.**

Moving through lists with
leading and trailing pointers is also useful if you want to add a node to the
middle of a list. This is how you might go about it to build a list where the **key**
value in each node of the list goes from smallest to biggest.

**procedure insert (headPtr
: dataPtrType);**

**var**

**
number : integer;**

**
leadPtr, trailPtr, newPtr : dataPtrType;**

**
found, done : boolean;**

**begin**

**
write ('Enter a number : ');**

**
readln(number);**

**
trailPtr := headPtr;**

**
leadPtr := headPtr^.next;**

**
found := false;**

** done
:= false;**

**
while (not done) and (not found) and (leadPtr <> nil) do**

**
begin**

**
if number > leadPtr^.key then**

**
begin**

**
leadPtr := leadPtr^.next;**

** ****
trailPtr := trailPtr^.next; **{Move one place further on.}

**
end**

**
else if number = leadPtr^.key then**

** ****
found := true; **{The element already exists.}

**
else**

** ****
done := true; **{We have found the place to insert the element.}

** end;**

**if found then**

**
writeln ('The element already exists');**

**else**

**
begin**

**
new(newPtr); **{The code for
inserting the new node.}

**
newPtr^.key := number;**

**
newPtr^.next := leadPtr;**

**
trailPtr^.next := newPtr;**

** end;**

**end;**

This section has now explained all the basics you need to know to begin working on more advanced data structures and algorithms. These are set out below, though not in as much detail as the information in this section.

Like Stacks, Queues limit the way nodes can be entered into and removed from lists. Queues take a first in, last out approach. There are only two basic operations, one inserts a new element into the beginning of the queue, the other removes an item from the end of the queue. Basically, an element isn't allowed out until all those inserted before it have been removed.

The following are the
declarations necessary for a linked list based queue. It is a circular linked
list with a single external pointer, **Q**, that points to the end of the
list:

**type nodePtr = ^ node;**

**node = record**

**item : integer;**

**next : nodePtr;**

**end;**

**var**

**queue = nodePtr;**

And these are the procedures and functions it uses:

**procedure queueinit (var
Q : queue; **{For
initializing a previously unused queue.}

**begin**

** Q := nil;**

**end;**

**function IsEmpty (Q :
queue) : boolean;**

**begin**

**
IsEmpty := (Q = nil);**

**end;**

**procedure insert(Q :
queue; newitem : integer); **{For adding a new node with the value **k** in its **key**
value.}

**var**

**newptr :nodeptr;**

**begin**

**
new(newptr);**

**
newptr^.key := newitem);**

** if
IsEmpty(Q) then **{Uses the procedure above to test if the list is empty.}

**
newptr^.next := newptr**

** else**

**
begin**

**
newptr^.next := Q^.next;**

**
Q^.next := newptr**

** end;**

** Q :=
newptr;**

**end;**

This procedure removes an item and prints out it's value:

**Procedure Remove (var Q
: queue; var success : boolean);**

**var**

**front : nodeptr;**

**begin**

** if
IsEmpty(Q) then**

**
success := false**

** else**

**
begin**

**
front := Q^.next;**

**
if front = Q then **{If there is one node in the queue.}

**
begin**

**
writeln(front^.item);**

**
Q := nil;**

**
end**

**
else**

**
Q^.next := front^.next;**

**
writeln(front^.item);**

**
dispose(front);**

**
success := true;**

**
end**

**end;**

Queues can also be implemented with arrays providing you know roughly how big the list will be at its greatest.

The declaration may look something like this:

**const MAX = 100;**

**var**

**
queue : array (0 .. MAX] of integer;**

**
head, tail : integer;**

The procedures and functions it uses will then look like this:

**procedure queueinit; **{Initializes
the queue}

**begin**

** head
:= 0;**

** tail
:= 0;**

**end;**

**procedure put(k :
integer);**

**begin**

**
queue[tail] := k;**

** tail
:= tail + 1;**

**if tail > MAX then**

** ****
tail := 0; **{This allows it to keep going in a loop - just make sure there
aren't more than 100 elements in the array at any one time.}

**end;**

**function get : integer;**

**begin**

** get
:= queue[head];**

** head
:= head + 1;**

**if head > MAX then**

** head
:= 0;**

**end;**

Like Queues, Stacks restrict the way data is accessed. Stacks tend to be more widely used that Queues. Only two operations are involved: one inserts a new element at the top, the other pops an element off the top. There is no way to insert an element at any place but the top, and no way to remove any element other than the top one. It works like a stack of plates, plates are added to the top of the pile and removed from the top of the pile. Consequently, it isn't until all the plates are removed that the oldest plate is removed. This approach is also called last in, first out.

The following are the declarations necessary for all the procedures:

**type dataType = ^
dataRecType;**

**dataRecType = record**

**key : integer;**

**next : dataType;**

**end;**

**var**

**head, newptr : dataType;**

These are the procedures stacks can use:

**procedure stackinit; **{For
initializing a previously unused stack.}

**begin**

**
new(head);**

**
new(newptr);**

**
head^.next := newptr;**

**
newptr^.next := newptr;**

**end;**

**procedure push(k :
integer); **{For adding a new node with the value **k** in its **key**
value.}

**var**

**temp : dataPtrType;**

**begin**

**
new(temp);**

**
temp^.key := k;**

** ****
temp^.next := head^.next; **{There is a dummy head node, so effectively this
new element is at the start of the list. **Temp** points to the node that
previously was the effective head.}

**
head^.next := temp;**

**end;**

**function pop : integer;
**{Returns the number in the first node (after the dummy node) and deletes
that node.}

**var**

** temp
: dataPtrType;**

**begin**

** temp
:= head^.next;**

** pop
:= temp^.key;**

** ****
head^.next := temp^.next; **{The dummy head node now points to the node one
further down the list.}

**
dispose(temp);**

**end;**

Stacks can also be implemented as arrays. This is one way you can declare a stack as a record:

**const MAX = 20;**

**stack = record**

** ****
top : 0 .. MAX; **{we could make this an integer, but using a sub range allows
us to check when we are at the end of the array.}

**
items : array [1 .. MAX] of integer;**

**end;**

**var**

**S : stack;**

The following are the basic functions and procedures they use:

**procedure create (var S
: stack);**

**begin**

**S.top := 0**

**end;**

**function IsEmpty (S :
stack) : boolean;**

**begin**

**IsEmpty := (S.top = 0);**

**end; **

We use **Push** to add
a new element. It will not let us do this if the array is full:

**procedure Push (var S : stack; newitem : integer; var success
: boolean);**

**begin**

** if S.top = MAX then**

** success := false**

** else**

** begin**

** S.top := S.top +
1;**

** S.Items[S.top] :=
newitem;**

** success := true;**

** end**

**end;**

This procedure removes an element and prints it out. It uses IsEmpty (above) to see if the array is empty:

**procedure Pop (var S : stack; var success : boolean);**

**begin**

** if IsEmpty(S) then**

** success := false;**

** else**

** begin**

**
writeln(S.items[S.top]);**

**
S.top := S.top - 1;**

**
success := true**

** end**

**end;**

__Binary Trees__

Binary trees are simple to describe. They consist of nodes, like
linked lists, but instead of each node pointing to one other node, each node can
point to two other nodes. There is no obligation on a node to point to two other
nodes, it can point to **nil** on either/both sides.

This is an example of how the nodes might be declared:

**type**

**nodeptr = ^node;**

**node = record**

**key : integer;**

**left : nodeptr;**

**right : nodeptr;**

**end;**

**var**

**T : nodeptr;**

Note how similar it is to the declaration of a linked list, the only difference is that it potentially points to two other nodes rather than one.

The following are the main procedures used with trees:

**procedure create (var T : nodeptr);**

**begin**

** T := nil**

**end;**

Here is a procedure for inserting a new node into the tree
according to its key value. It uses recursion to find the right place to insert
the node. This only works if the values of **key** are unique of course:

**procedure insert (var T : nodeptr; num : integer);**

**begin**

**if T = nil then**

**begin**

** new(T);**

** T^.key := num;**

** T^.left := nil;**

** T^.right := nil;**

**end**

**else if num < T^.key then**

** insert(T^.left, num)**

**else**

** insert(T^.right, num);**

**end;**

This procedure searches through the tree for a **key** value
specified in the argument list as **num**. If it finds it it prints it out
and returns **success** as true, otherwise it returns **success** as
false:

**procedure retrieve (T : nodeptr; num : integer; var success
boolean);**

**begin**

**if T = nil then**

**success := false**

**else if num = T^.key then**

**begin**

** writeln('Number found: ', T^.key);**

** success := true;**

**end**

**else if num < T^.key then**

**retrieve(T^.left, num, success)**

**else**

**retrieve(T^.right, num, success)**

**end;**

__Tree
Traversal__

Trees lend themselves to recursion. Nowhere is this more true than when it comes to tree traversal. There are three ways to traverse a tree, these procedures are universally referred to as preorder, postorder and inorder.

Inorder can be used to print the elements out in ascending order:

**procedure inorder (T : nodeptr);**

**begin**

**if T <> nil then**

**begin**

**inorder(T^.left);**

**writeln(T^.key);**

**inorder(T^.right);**

**end**

**end;**

The simplicity of this is great, there is nothing to it. The same is true for the other tree traversing procedures.

Preorder will keep going to the left as far as it can as it descends down a tree, then goes right. To see how it works in practice enter it into your compiler.

**procedure preorder (T : nodeptr);**

**begin**

**if T <> nil then**

**begin**

**writeln(T^.key);**

**preorder(T^.left);**

**preorder(T^.right);**

**end**

**end;**

It doesn't take a genius to guess what postorder will look like:

**procedure postorder (T : nodeptr);**

**begin**

**if T <> nil then**

**begin**

**postorder(T^.left);**

**postorder(T^.right);**

**writeln(T^.key);**

**end**

**end;**

If you don't see what this one is doing, enter a series of elements, draw the tree you have entered down on paper, and compare it to the results postorder gives.

__Graphs
with Arrays__

Graphs consist of a whole
lot of elements and the connections between them (actually it is a collection of
*vertices* and *edges*). Unlike trees and linked lists, graphs allow
any element to be linked to any other element. Consider a bus map of America :
we have a whole lot of cities and a whole lot or routes connecting them.
Basically any city can be connected to any other either directly, or through
intermediaries. There is absolutely no restriction on whether some city can be
connected to some other city.

The easiest way to create graphs is with two dimensional arrays, Consider an array

**cities : array[1 .. 6,
1 .. 6] of integer;**

Which might be mapped something like this: the top row are departures, the vertical row are destinations.

1: Chicago | 2: New York | 3: LA | 4: Miami | 5: Roswell | 6: Boston | |

1: Chicago | 1 | 1 | 0 | 0 | 1 | 1 |

2: New York | 0 | 1 | 0 | 1 | 1 | 0 |

3: LA | 1 | 0 | 1 | 0 | 1 | 0 |

4: Miami | 0 | 1 | 1 | 1 | 1 | 1 |

5: Roswell | 1 | 1 | 1 | 0 | 1 | 1 |

6: Boston | 0 | 0 | 0 | 1 | 1 | 1 |

The number 1 represents that there is a bus from one city to the other, 0 means there is no bus. There is a bus from Miami to New York but no bus from Miami to Chicago. Also note that the connections go both ways. There is a bus from New York to Chicago, but none from Chicago to New York.

You could also use boolean variables rather than integers of course, but there is no real difference.

Once you can picture graphs as two dimensional arrays the rest is easy. It is simple to create one and two way connections, to delete connections etc.

There are some other very difficult issues however. Given that there is no direct route from New York to LA, and given that we know the distances from each city to each other city, what is the quickest route to LA using intermediaries?

__Graphs
with Linked Lists__

Using arrays to represent graphs is fine if you know roughly how many elements the graph contains. Since this is often not the case two way linked lists often need to be used.

__Simple Sorting
Algorithms__

I will begin by describing some of the most simple sorting algorithms. These are quite slow, especially as the size of the data to be sorted increases, but they can be useful if you have a small amount of data, and you only want to run the algorithm a few times. As a rule these algorithms take N squared steps to sort N. For all the sorts in this section we are putting an array of integers into order.

The first method is called
Selection Sort. This is the sorting algorithm most people would come up with on
their own if they were asked to sort some data. Basically you run through the
whole array, find the smallest element, and swap it with the element in the
first position. Then, starting at the second position, you run through the whole
array, find the smallest element, and swap it with the element in the second
position. This continues until the whole array is sorted. If **N** is the
number of elements in the array, this method takes** (N - 1)** passes (You
don't have to run through the array when only one element is left).

This is the code (**N**
represents the size of the array, **A** is the name of the array).

**procedure selection;**

**var**

**i, j, min, temp :
integer;**

**begin**

**for i := 1 to N - 1 do **{The
total number of times we need to go through the array.}

**
begin**

**
min := i; **{**min** holds the
position of the lowest element in the array - it starts each loop at the element
we begin the search from.}

**
for j := i + 1 to N do **

**if A[j] < A[min] then **

**min := j;**

**
temp := a[min]; **{This starts the three step process to swap the smallest
element found in this pass with the element we are up to in the array.}

**a[min] := a[i];**

**
a[i] := temp;**

** end;**

**end;**

This is really a brute-force approach, but it gets the job done, and is actually reasonably efficient for sorting small files.

The next method to look at is insertion sort. It involves considering the elements one at a time, and inserting each element in its proper place amongst those already sorted. You insert an element by moving larger elements one position to the right.

**procedure insertion;**

** var **

** i,
j, value : integer;**

**begin**

** for
i := 2 to N do **{This is the
number of times the loop below is performed.}

**
begin**

**
value := A[i]; **{This is the
element we are working with this time through.}

**
j := i; **{j holds the array position
of this element.}

**
while A[j - 1] > value do; **{We
start with the element before value, and head down to the start of the array. If
an element is bigger than value, it is moved to the right.}

**
begin**

**
A[j] := A[j - 1];**

**
j := j - 1;**

**
end;**

** ****
A[j] := value **{This value is inserted in the space cleared.}

**
end**

**end;**

The Bubble sort is another simple sort. It isn't very good but its got a nice name, so this is what it is:

Keep passing through the file, exchanging adjacent elements if necessary. When no exchanges are needed on an entire pass through the file it is fully sorted.

** procedure bubble;**

** var**

**
i, j, temp : integer;**

**
begin**

** ****
for i := N downto 1 do** {The biggest element left in each pass gets put in
the right most unsorted position each time, so we can go one position less each
time.}

**
for j := 2 to i do**

**
if A[j - 1] > A[j] then**

**
begin**

**
temp := a[j - 1];**

**
A[j - 1] := A[j];**

**
A[j] := temp**

**
end**

**end;**

If the biggest element was in possition 1 on the first pass, by the end of the pass it would be in the last position. It would be progressively swapped with every element in the array. As you can guess, this takes a lot of resources, and is very inefficient. Bubble sorts are much better when the file is almost sorted. In fact only one pass is required if the file is already sorted.

__Shellsort__

Shellsort is like Insertion sort, except it allows exchange of elements that are far apart.

**procedure shellsort;**

**label 0;** {goto
statements are still useful for breaking out of loops despite what you have been
told about them}

**var**

**i, j, h, value :
integer;**

**begin**

**h := 1;**

**repeat**

** h :=
(3 * h) + 1 **{This can be
changed, but this is a good formula.}

**until h > N;**

**repeat**

** h :=
h div 3;**

** for
i := h + 1 to N do**

**
begin**

**
value := A[i];**

**
j := i;**

**
while A[j - h] > value do**

**
begin**

**
A[j] := A[j - h];**

**
j := j - h;**

**
if j <= h then**

**
goto 0**

**
end;**

**0: A[j] := vale**

** end**

**until h = 1**

**end;**

__Quick
Sort__

Quicksort is the most widely used sorting algorithm. Unlike the others listed above, Quicksort is recursive. It works by partitioning a file into two parts, and sorting them independently. This is the basic version:

**procedure quicksort (l, r : integer);**

**var **

**i, j, v, t : integer;**

**begin**

**if r > l then**

**begin**

** v := A[r];**

** i := l - 1;**

** j := r;**

** repeat**

** repeat **

** i := i + 1**

** until A[i] >=
v;**

** repeat**

** j := j - 1**

** until A[j] <=
v;**

** t := A[i];**

** A[i] := A[r];**

** A[r] := t;**

** quicksort (l , i - 1);**

** quicksort (i + 1, r)**

** end**

**end;**

I really hate algorithms like these don't you. I suggest the best way to learn how it works is to sort a file you have written down on paper by going through the loop step by step.

__Searching
Algorithms__

Hashing represents a completely different approach to storing data than trees and linked lists. It removes the need to search for a record through using an 'address calculator' that calculates where in an array records with certain key values should be placed. To put it another way, it allows us to directly reference records in a table by doing arithmetic transformations on keys in those records.

Hashing is really pretty simple. The first step is to create the 'address calculator' that will transforms the search key into a table address. In a perfect world all different keys would map to a different address, but this rarely happens, so the second step is working out what to do in these collision situations. This resolution can be achieved with linked lists or arrays.

This is the pseudo code for inserting an element into the array:

**Insert (arrayname, newitem)**

**Tell the address calculator the search key of newitem**

**Let i be the location the calculator tells you to use**

**array[i] := newitem**

The address calculator spoken of is usually called a hash function. There are several well known hash functions that have been chosen because of their ability to evenly distribute elements amongst the available memory addresses.

I will first describe some hash functions, and then look at the issue of collision resolution.

Hash functions often depend on what the key values look like. One approach is selecting digits:

Suppose we have an array of 100 spaces we need to map keys to. If you have a long employee number like 875 383 288 you may decide to use a combination of the third and last numbers, to give you the value of 58.

A problem with this approach is that it isn't very good at evenly distributing the key values across the array.

Another variation on this approach is to add all the numbers together, this is called folding:

8 + 7 + 5 + 3 + 8 + 3 + 2 + 8 + 8 = 52

I don't like this approach, the numbers will seldom come out very low, i.e. there are only 9 ways to get the value 1, but there are scores of ways to get a number like 52.

Another good approach is modular aritmetic:

table possition **:=** keynumber **mod** size of array

**875383288 mod 100** naturally equals 88 - the last two
digits. 100 is a very bad size, a much better number is a prime number like 101
- this gives 27.

Modular functions where the table size is a prime number probably represent the best approach. The calculation is quick to perform, and gives a good spread over the array.

We now come to the problem of collisions. 875383288 isn't the only number that will map to position 27.

One approach to a collision is to remap the number. If its first choice position is taken look for another, and another until one is empty. One approach is called linear probing : if the position you want is taken try the next one along, and then the one after that. Under this approach, if you get to the end of the array you should go back to the beginning.

The biggest problem with this approach is that as the array fills up, it begins to take longer and longer to find an empty position. Before long you have lost the benefits of hashing and are, in fact, implementing a search - the very thing hashing is meant to avoid.

Wednesday, February 23, 2000