Data Structures and Algorithms
Please contact me to suggest any changes to this page. Most of the code on this page is in Pascal. There are quite a few freeware Pascal compilers available, try softseek.com to find one. I have had some trouble with some of them, Free Pascal and Bloodshed Pascal in particular. I use Borland's Turbo Pascal 7 which I think is fantastic even if it looks like it was made in 1963. It is also easy to implement the code in C. I recommend the LCC-Win32 compiler, a windows version can be downloaded here (2.29 MB). Also remember that most C++ compilers compile C code.
The following are the major data structures and algorithms used in computer Programming. Although the examples are given in Pascal, the same fundamental data structures are found in most procedural and Object Orientated languages such as JAVA, C and C++. It is very easy to convert the code below to C and C++ in particular.
This is not the most interesting area of Computer Science, but unfortunately it is necessary (or so they tell me). I have found that the best way to learn this stuff is to write programs. It is often a hit and miss process, but just keep adapting your code until the compiler accepts it. It also isn't a bad idea to go through the code line by line and write down or draw a diagram of what is happening at each line.
This page focuses more on the implementation of data structures and their algorithms than the theory behind them. For information on theory you had better find a big book and an even bigger pot of coffee.
The linked list section provides a lot of the basic knowledge necessary to understand many of the others. It presupposes a knowledge of the basics of Pascal including arrays, records and pointers.
Some of this code could have small errors in it since I can't cut and paste between windows programs and my Pascal environment. Any errors should be easy to find - they probably involve semi colons.
Linked Lists
This section presents a general introduction to linked lists. It demonstrates how to construct a list, how to add new nodes to the front, back or middle, how to print the list, how to search the list, and how to delete nodes. This section provides the basic knowledge you need to understand most other sections on this page. More advanced topics are covered in the following sections: Queues, Stacks, Trees, Graphs etc.
Linked lists are like arrays in many ways, except they have the added advantage of being able to grow and shrink. With arrays you need to declare how many elements it has at compile time (some languages do allow arrays to grow, notably VB). A linked list is dynamic, it grows as each node is added, and shrinks as each node is deleted. It is also much easier to add a node to the middle of a linked list than the middle of an array.
The following is the declarations needed to create a linked list where each node in the list holds an element called number, and a pointer to the next node called next.
type
dataPtrType = ^dataRecType; {Note that we can declare a type of dataRecType before declaring dataRecType : the declaration is on the lines below.}
dataRecType = record {This is the record that determines the structure of the nodes}
number : integer;
next : dataPtrType; {Next is of dataPtrType ie. it is a pointer to dataRecType - the record type it is declared in.}
end;
var
head, newPtr : dataPtrType; {We now declare two pointers to dataRecType.)
response : integer;
This is the code that creates the list. Note that it has an if-then-else clause that tests to see if the list has already been started. The top half is only executed the first time through the loop (if head = nil).
begin
head := nil; {It is always a good idea to initialize pointers to equal nil before assigning them a memory address.}
writeln('Enter a number');
readln(response);
while response <> 0 do
begin
new(newptr); {We create an area in memory for the new node - newPtr points to this area.}
newPtr^.number := response; {The thing newPtr points to is the record dataRecType - the value number in this record takes the value of response.}
if head = nil then
newPtr^.next := nil {This happens the first time through the loop. This node will always be the last in the list (other nodes are created before it rather than after it) so it is very important that the next points to nil, this ensures we know when we have come to the end of the list.}
else
newPtr^.next := head; {New nodes are put at the front - then next in them points to the node that previously was the head.}
head := newPtr; {This new element now becomes the head of the list}
writeln('Enter a number'); {We then begin the loop again}
readln(response);
end;
end.
We can now write a procedure to print out this list.
procedure print(head : dataPtrType); {It receives the pointer to the head of the list}
var
tempPtr : dataPtrType; {It creates its own pointer to dataPtrType so that the value in head can be stored in it.}
begin
if head = nil then
writeln('Nothing in the List') {This is in case the list is empty.}
else
begin
tempPtr := head;
while tempPtr <> nil do {We move along until we get to the last node which was set to point to nil}
begin
writeln(tempPtr^.number);
tempPtr := tempPtr^.next; {This is how you move to the next element in the list.}
end
end;
end;
The preceding code has created a linked list and printed it out. There are a number of improvements that can be made to it however. It is more efficient to create a function that initializes the list by creating a dummy head node. This removes the need for the if-then-else clause when we add nodes.
The function looks like this:
function initializelist : dataPtrType;
var
temp : dataPtrType;
begin
new(temp);
temp^.number := -10000; {This is a junk value, but it is best to initialize it nonetheless. Make it a value that you would never put into the list normally.}
temp^.next := nil;
initializelist := temp;
end;
At the start of the mainline call the function initializelist like this:
begin
head: = initializelist;
tail := head;
Now nodes can be added to the list without checking is head = nil.
Note that there is also going to be a tail as well as a head. The tail always points to the last node in the list just as the head always points to the first. This will help us add nodes to the end of the list as well as the beginning.
The following procedure adds a node to the end of the list:
procedure addToEnd (resp : integer; var tailPtr : dataPtrType); {The argument list accepts the number the user wants to put into the list, plus a pointer to tail. Note that this is a var parameter, the value of tail will change to the new node we create.}
var
temp : dataPtrType;
begin
new(temp);
temp^.number := resp;
temp^.next := nil; {temp will be the last element in the list, so it must point to nil.}
tailPtr^.next := temp; {The previous tail of the list now points to temp : the new last element in the list.}
tailPtr := temp; {temp becomes the new tail of the list}
end;
We now can add nodes with the following code:
writeln('Enter a number');
readln(response);
while (response > 0) do
begin
addToEnd(response, tail);
write('Enter a number');
readln(response);
end;
end.
Another important utility for linked lists is a search procedure. This asks the user for something to search for (a number in this case), and begins a search from the head node. If it finds the node it stops searching and returns a Boolean variable with the value true along with a pointer to the node. If it gets to the end of the list without finding the node it returns false. Naturally, it accepts a pointer to head, but this is under the guise of a pointer named location, because the procedure is going to change the value of this pointer, and we don't want to change the value of head.
procedure search(target : integer; var ptr : dataPtrType; var found : boolean);
begin
found := false;
ptr := ptr^.next; {Remember their is a dummy head node that we just skip.}
while (not found) and (ptr <> nil) do {i.e. while the node hasn't been found, and we haven't reached the end of the list, keep searching.}
begin
if ptr^.number = target then
found := true {It breaks out of the loop at this time, returning the value of true for found, and also a pointer to the node it was found in (remember, ptr was a var parameter.)}
else
ptr := ptr^.next;
end;
end;
This procedure can be called as follows:
var {These are the variables the mainline needs to call the procedure.}
num : integer;
location : dataPtrType;
found : boolean;
begin
write('Enter a number to search for');
readln(num);
location := head; {We want to start searching from the top of the list. Remember we don't send head directly, because we don't want the value of head altered.}
search(num, location, found);
if found then
writeln(location^.number); {location was accepted as a var parameter, so if it was found, location is set to point to the node in which it was found.}
else
writeln('Number not found');
end.
Another important procedure is a delete procedure for removing nodes from linked lists. For this we need to search for the node we want to delete. This is not enough however, we also need a pointer to the node before the node we want to delete because we need to change its next value to point to the pointer after the one to be deleted. Once we got to the pointer to be deleted there would be no way to move back one place, so instead we move through the list with two pointers. Not only do we have a pointer looking for the node to be deleted (deletePtr), we have a pointer always one place behind called trailing. Once the pointer to be deleted is found we set trailing to point to trailing^.next.
procedure delete (headPtr : dataPtrType); {This procedure only accepts a pointer to head in its argument list.}
var
unwantedNum : integer;
trailingPtr, deletePtr : dataPtrType;
found : boolean;
begin
writeln('Enter the unwanted number');
readln(unwantedNum);
found := false;
trailingPtr := headPtr;
deletePtr := headPtr^.next; {It doesn't matter that we won't search the head node for the number, because it was a dummy head node.}
while (not found) and (deletePtr := nil)
begin
if (deletePtr^.number = unwantedNum) then
begin
trailPtr^.next := deletePtr^.next; {Nothing now points to deletePtr, and the list is still joined.}
dispose(deletePtr);
found := true;
end
else
begin
trailPtr := deletePtr; {Move both pointers one step along the list}
actualPtr := deletePtr^.next;
end;
end;
end.
Moving through lists with leading and trailing pointers is also useful if you want to add a node to the middle of a list. This is how you might go about it to build a list where the key value in each node of the list goes from smallest to biggest.
procedure insert (headPtr : dataPtrType);
var
number : integer;
leadPtr, trailPtr, newPtr : dataPtrType;
found, done : boolean;
begin
write ('Enter a number : ');
readln(number);
trailPtr := headPtr;
leadPtr := headPtr^.next;
found := false;
done := false;
while (not done) and (not found) and (leadPtr <> nil) do
begin
if number > leadPtr^.key then
begin
leadPtr := leadPtr^.next;
trailPtr := trailPtr^.next; {Move one place further on.}
end
else if number = leadPtr^.key then
found := true; {The element already exists.}
else
done := true; {We have found the place to insert the element.}
end;
if found then
writeln ('The element already exists');
else
begin
new(newPtr); {The code for inserting the new node.}
newPtr^.key := number;
newPtr^.next := leadPtr;
trailPtr^.next := newPtr;
end;
end;
This section has now explained all the basics you need to know to begin working on more advanced data structures and algorithms. These are set out below, though not in as much detail as the information in this section.
Like Stacks, Queues limit the way nodes can be entered into and removed from lists. Queues take a first in, last out approach. There are only two basic operations, one inserts a new element into the beginning of the queue, the other removes an item from the end of the queue. Basically, an element isn't allowed out until all those inserted before it have been removed.
The following are the declarations necessary for a linked list based queue. It is a circular linked list with a single external pointer, Q, that points to the end of the list:
type nodePtr = ^ node;
node = record
item : integer;
next : nodePtr;
end;
var
queue = nodePtr;
And these are the procedures and functions it uses:
procedure queueinit (var Q : queue; {For initializing a previously unused queue.}
begin
Q := nil;
end;
function IsEmpty (Q : queue) : boolean;
begin
IsEmpty := (Q = nil);
end;
procedure insert(Q : queue; newitem : integer); {For adding a new node with the value k in its key value.}
var
newptr :nodeptr;
begin
new(newptr);
newptr^.key := newitem);
if IsEmpty(Q) then {Uses the procedure above to test if the list is empty.}
newptr^.next := newptr
else
begin
newptr^.next := Q^.next;
Q^.next := newptr
end;
Q := newptr;
end;
This procedure removes an item and prints out it's value:
Procedure Remove (var Q : queue; var success : boolean);
var
front : nodeptr;
begin
if IsEmpty(Q) then
success := false
else
begin
front := Q^.next;
if front = Q then {If there is one node in the queue.}
begin
writeln(front^.item);
Q := nil;
end
else
Q^.next := front^.next;
writeln(front^.item);
dispose(front);
success := true;
end
end;
Queues can also be implemented with arrays providing you know roughly how big the list will be at its greatest.
The declaration may look something like this:
const MAX = 100;
var
queue : array (0 .. MAX] of integer;
head, tail : integer;
The procedures and functions it uses will then look like this:
procedure queueinit; {Initializes the queue}
begin
head := 0;
tail := 0;
end;
procedure put(k : integer);
begin
queue[tail] := k;
tail := tail + 1;
if tail > MAX then
tail := 0; {This allows it to keep going in a loop - just make sure there aren't more than 100 elements in the array at any one time.}
end;
function get : integer;
begin
get := queue[head];
head := head + 1;
if head > MAX then
head := 0;
end;
Like Queues, Stacks restrict the way data is accessed. Stacks tend to be more widely used that Queues. Only two operations are involved: one inserts a new element at the top, the other pops an element off the top. There is no way to insert an element at any place but the top, and no way to remove any element other than the top one. It works like a stack of plates, plates are added to the top of the pile and removed from the top of the pile. Consequently, it isn't until all the plates are removed that the oldest plate is removed. This approach is also called last in, first out.
The following are the declarations necessary for all the procedures:
type dataType = ^ dataRecType;
dataRecType = record
key : integer;
next : dataType;
end;
var
head, newptr : dataType;
These are the procedures stacks can use:
procedure stackinit; {For initializing a previously unused stack.}
begin
new(head);
new(newptr);
head^.next := newptr;
newptr^.next := newptr;
end;
procedure push(k : integer); {For adding a new node with the value k in its key value.}
var
temp : dataPtrType;
begin
new(temp);
temp^.key := k;
temp^.next := head^.next; {There is a dummy head node, so effectively this new element is at the start of the list. Temp points to the node that previously was the effective head.}
head^.next := temp;
end;
function pop : integer; {Returns the number in the first node (after the dummy node) and deletes that node.}
var
temp : dataPtrType;
begin
temp := head^.next;
pop := temp^.key;
head^.next := temp^.next; {The dummy head node now points to the node one further down the list.}
dispose(temp);
end;
Stacks can also be implemented as arrays. This is one way you can declare a stack as a record:
const MAX = 20;
stack = record
top : 0 .. MAX; {we could make this an integer, but using a sub range allows us to check when we are at the end of the array.}
items : array [1 .. MAX] of integer;
end;
var
S : stack;
The following are the basic functions and procedures they use:
procedure create (var S : stack);
begin
S.top := 0
end;
function IsEmpty (S : stack) : boolean;
begin
IsEmpty := (S.top = 0);
end;
We use Push to add a new element. It will not let us do this if the array is full:
procedure Push (var S : stack; newitem : integer; var success : boolean);
begin
if S.top = MAX then
success := false
else
begin
S.top := S.top + 1;
S.Items[S.top] := newitem;
success := true;
end
end;
This procedure removes an element and prints it out. It uses IsEmpty (above) to see if the array is empty:
procedure Pop (var S : stack; var success : boolean);
begin
if IsEmpty(S) then
success := false;
else
begin
writeln(S.items[S.top]);
S.top := S.top - 1;
success := true
end
end;
Binary Trees
Binary trees are simple to describe. They consist of nodes, like linked lists, but instead of each node pointing to one other node, each node can point to two other nodes. There is no obligation on a node to point to two other nodes, it can point to nil on either/both sides.
This is an example of how the nodes might be declared:
type
nodeptr = ^node;
node = record
key : integer;
left : nodeptr;
right : nodeptr;
end;
var
T : nodeptr;
Note how similar it is to the declaration of a linked list, the only difference is that it potentially points to two other nodes rather than one.
The following are the main procedures used with trees:
procedure create (var T : nodeptr);
begin
T := nil
end;
Here is a procedure for inserting a new node into the tree according to its key value. It uses recursion to find the right place to insert the node. This only works if the values of key are unique of course:
procedure insert (var T : nodeptr; num : integer);
begin
if T = nil then
begin
new(T);
T^.key := num;
T^.left := nil;
T^.right := nil;
end
else if num < T^.key then
insert(T^.left, num)
else
insert(T^.right, num);
end;
This procedure searches through the tree for a key value specified in the argument list as num. If it finds it it prints it out and returns success as true, otherwise it returns success as false:
procedure retrieve (T : nodeptr; num : integer; var success boolean);
begin
if T = nil then
success := false
else if num = T^.key then
begin
writeln('Number found: ', T^.key);
success := true;
end
else if num < T^.key then
retrieve(T^.left, num, success)
else
retrieve(T^.right, num, success)
end;
Tree Traversal
Trees lend themselves to recursion. Nowhere is this more true than when it comes to tree traversal. There are three ways to traverse a tree, these procedures are universally referred to as preorder, postorder and inorder.
Inorder can be used to print the elements out in ascending order:
procedure inorder (T : nodeptr);
begin
if T <> nil then
begin
inorder(T^.left);
writeln(T^.key);
inorder(T^.right);
end
end;
The simplicity of this is great, there is nothing to it. The same is true for the other tree traversing procedures.
Preorder will keep going to the left as far as it can as it descends down a tree, then goes right. To see how it works in practice enter it into your compiler.
procedure preorder (T : nodeptr);
begin
if T <> nil then
begin
writeln(T^.key);
preorder(T^.left);
preorder(T^.right);
end
end;
It doesn't take a genius to guess what postorder will look like:
procedure postorder (T : nodeptr);
begin
if T <> nil then
begin
postorder(T^.left);
postorder(T^.right);
writeln(T^.key);
end
end;
If you don't see what this one is doing, enter a series of elements, draw the tree you have entered down on paper, and compare it to the results postorder gives.
Graphs with Arrays
Graphs consist of a whole lot of elements and the connections between them (actually it is a collection of vertices and edges). Unlike trees and linked lists, graphs allow any element to be linked to any other element. Consider a bus map of America : we have a whole lot of cities and a whole lot or routes connecting them. Basically any city can be connected to any other either directly, or through intermediaries. There is absolutely no restriction on whether some city can be connected to some other city.
The easiest way to create graphs is with two dimensional arrays, Consider an array
cities : array[1 .. 6, 1 .. 6] of integer;
Which might be mapped something like this: the top row are departures, the vertical row are destinations.
| 1: Chicago | 2: New York | 3: LA | 4: Miami | 5: Roswell | 6: Boston | |
| 1: Chicago | 1 | 1 | 0 | 0 | 1 | 1 |
| 2: New York | 0 | 1 | 0 | 1 | 1 | 0 |
| 3: LA | 1 | 0 | 1 | 0 | 1 | 0 |
| 4: Miami | 0 | 1 | 1 | 1 | 1 | 1 |
| 5: Roswell | 1 | 1 | 1 | 0 | 1 | 1 |
| 6: Boston | 0 | 0 | 0 | 1 | 1 | 1 |
The number 1 represents that there is a bus from one city to the other, 0 means there is no bus. There is a bus from Miami to New York but no bus from Miami to Chicago. Also note that the connections go both ways. There is a bus from New York to Chicago, but none from Chicago to New York.
You could also use boolean variables rather than integers of course, but there is no real difference.
Once you can picture graphs as two dimensional arrays the rest is easy. It is simple to create one and two way connections, to delete connections etc.
There are some other very difficult issues however. Given that there is no direct route from New York to LA, and given that we know the distances from each city to each other city, what is the quickest route to LA using intermediaries?
Graphs with Linked Lists
Using arrays to represent graphs is fine if you know roughly how many elements the graph contains. Since this is often not the case two way linked lists often need to be used.
Simple Sorting Algorithms
I will begin by describing some of the most simple sorting algorithms. These are quite slow, especially as the size of the data to be sorted increases, but they can be useful if you have a small amount of data, and you only want to run the algorithm a few times. As a rule these algorithms take N squared steps to sort N. For all the sorts in this section we are putting an array of integers into order.
The first method is called Selection Sort. This is the sorting algorithm most people would come up with on their own if they were asked to sort some data. Basically you run through the whole array, find the smallest element, and swap it with the element in the first position. Then, starting at the second position, you run through the whole array, find the smallest element, and swap it with the element in the second position. This continues until the whole array is sorted. If N is the number of elements in the array, this method takes (N - 1) passes (You don't have to run through the array when only one element is left).
This is the code (N represents the size of the array, A is the name of the array).
procedure selection;
var
i, j, min, temp : integer;
begin
for i := 1 to N - 1 do {The total number of times we need to go through the array.}
begin
min := i; {min holds the position of the lowest element in the array - it starts each loop at the element we begin the search from.}
for j := i + 1 to N do
if A[j] < A[min] then
min := j;
temp := a[min]; {This starts the three step process to swap the smallest element found in this pass with the element we are up to in the array.}
a[min] := a[i];
a[i] := temp;
end;
end;
This is really a brute-force approach, but it gets the job done, and is actually reasonably efficient for sorting small files.
The next method to look at is insertion sort. It involves considering the elements one at a time, and inserting each element in its proper place amongst those already sorted. You insert an element by moving larger elements one position to the right.
procedure insertion;
var
i, j, value : integer;
begin
for i := 2 to N do {This is the number of times the loop below is performed.}
begin
value := A[i]; {This is the element we are working with this time through.}
j := i; {j holds the array position of this element.}
while A[j - 1] > value do; {We start with the element before value, and head down to the start of the array. If an element is bigger than value, it is moved to the right.}
begin
A[j] := A[j - 1];
j := j - 1;
end;
A[j] := value {This value is inserted in the space cleared.}
end
end;
The Bubble sort is another simple sort. It isn't very good but its got a nice name, so this is what it is:
Keep passing through the file, exchanging adjacent elements if necessary. When no exchanges are needed on an entire pass through the file it is fully sorted.
procedure bubble;
var
i, j, temp : integer;
begin
for i := N downto 1 do {The biggest element left in each pass gets put in the right most unsorted position each time, so we can go one position less each time.}
for j := 2 to i do
if A[j - 1] > A[j] then
begin
temp := a[j - 1];
A[j - 1] := A[j];
A[j] := temp
end
end;
If the biggest element was in possition 1 on the first pass, by the end of the pass it would be in the last position. It would be progressively swapped with every element in the array. As you can guess, this takes a lot of resources, and is very inefficient. Bubble sorts are much better when the file is almost sorted. In fact only one pass is required if the file is already sorted.
Shellsort
Shellsort is like Insertion sort, except it allows exchange of elements that are far apart.
procedure shellsort;
label 0; {goto statements are still useful for breaking out of loops despite what you have been told about them}
var
i, j, h, value : integer;
begin
h := 1;
repeat
h := (3 * h) + 1 {This can be changed, but this is a good formula.}
until h > N;
repeat
h := h div 3;
for i := h + 1 to N do
begin
value := A[i];
j := i;
while A[j - h] > value do
begin
A[j] := A[j - h];
j := j - h;
if j <= h then
goto 0
end;
0: A[j] := vale
end
until h = 1
end;
Quick Sort
Quicksort is the most widely used sorting algorithm. Unlike the others listed above, Quicksort is recursive. It works by partitioning a file into two parts, and sorting them independently. This is the basic version:
procedure quicksort (l, r : integer);
var
i, j, v, t : integer;
begin
if r > l then
begin
v := A[r];
i := l - 1;
j := r;
repeat
repeat
i := i + 1
until A[i] >= v;
repeat
j := j - 1
until A[j] <= v;
t := A[i];
A[i] := A[r];
A[r] := t;
quicksort (l , i - 1);
quicksort (i + 1, r)
end
end;
I really hate algorithms like these don't you. I suggest the best way to learn how it works is to sort a file you have written down on paper by going through the loop step by step.
Searching Algorithms
Hashing represents a completely different approach to storing data than trees and linked lists. It removes the need to search for a record through using an 'address calculator' that calculates where in an array records with certain key values should be placed. To put it another way, it allows us to directly reference records in a table by doing arithmetic transformations on keys in those records.
Hashing is really pretty simple. The first step is to create the 'address calculator' that will transforms the search key into a table address. In a perfect world all different keys would map to a different address, but this rarely happens, so the second step is working out what to do in these collision situations. This resolution can be achieved with linked lists or arrays.
This is the pseudo code for inserting an element into the array:
Insert (arrayname, newitem)
Tell the address calculator the search key of newitem
Let i be the location the calculator tells you to use
array[i] := newitem
The address calculator spoken of is usually called a hash function. There are several well known hash functions that have been chosen because of their ability to evenly distribute elements amongst the available memory addresses.
I will first describe some hash functions, and then look at the issue of collision resolution.
Hash functions often depend on what the key values look like. One approach is selecting digits:
Suppose we have an array of 100 spaces we need to map keys to. If you have a long employee number like 875 383 288 you may decide to use a combination of the third and last numbers, to give you the value of 58.
A problem with this approach is that it isn't very good at evenly distributing the key values across the array.
Another variation on this approach is to add all the numbers together, this is called folding:
8 + 7 + 5 + 3 + 8 + 3 + 2 + 8 + 8 = 52
I don't like this approach, the numbers will seldom come out very low, i.e. there are only 9 ways to get the value 1, but there are scores of ways to get a number like 52.
Another good approach is modular aritmetic:
table possition := keynumber mod size of array
875383288 mod 100 naturally equals 88 - the last two digits. 100 is a very bad size, a much better number is a prime number like 101 - this gives 27.
Modular functions where the table size is a prime number probably represent the best approach. The calculation is quick to perform, and gives a good spread over the array.
We now come to the problem of collisions. 875383288 isn't the only number that will map to position 27.
One approach to a collision is to remap the number. If its first choice position is taken look for another, and another until one is empty. One approach is called linear probing : if the position you want is taken try the next one along, and then the one after that. Under this approach, if you get to the end of the array you should go back to the beginning.
The biggest problem with this approach is that as the array fills up, it begins to take longer and longer to find an empty position. Before long you have lost the benefits of hashing and are, in fact, implementing a search - the very thing hashing is meant to avoid.
Wednesday, February 23, 2000