Data Structures and Algorithms

Please contact me to suggest any changes to this page. Most of the code on this page is in Pascal. There are quite a few freeware Pascal compilers available, try softseek.com to find one. I have had some trouble with some of them, Free Pascal and Bloodshed Pascal in particular. I use Borland's Turbo Pascal 7 which I think is fantastic even if it looks like it was made in 1963. It is also easy to implement the code in C. I recommend the LCC-Win32 compiler, a windows version can be downloaded here (2.29 MB). Also remember that most C++ compilers compile C code.  

Please note : this page is not complete, it still needs a lot of work done on it. I apologize if you can't find the information you need. More information is being added all the time, so return soon.


Introduction

Linked Lists

Queues

Stacks

Trees

Tree Traversal

Graphs with Arrays

Graphs with Linked Lists

Simple Sorting Algorithms

Shell Sort

Quick Sort

Searching Algorithms

Hashing

 

Introduction

The following are the major data structures and algorithms used in computer Programming. Although the examples are given in Pascal, the same fundamental data structures are found in most procedural and Object Orientated languages such as JAVA, C and C++. It is very easy to convert the code below to C and C++ in particular.

This is not the most interesting area of Computer Science, but unfortunately it is necessary (or so they tell me). I have found that the best way to learn this stuff is to write programs. It is often a hit and miss process, but just keep adapting your code until the compiler accepts it. It also isn't a bad idea to go through the code line by line and write down or draw a diagram of what is happening at each line.

This page focuses more on the implementation of data structures and their algorithms than the theory behind them. For information on theory you had better find a big book and an even bigger pot of coffee.

The linked list section provides a lot of the basic knowledge necessary to understand many of the others. It  presupposes a knowledge of the basics of Pascal including arrays, records and pointers.

Some of this code could have small errors in it since I can't cut and paste between windows programs and my Pascal environment. Any errors should be easy to find - they probably involve semi colons.

Linked Lists

This section presents a general introduction to linked lists. It demonstrates how to construct a list, how to add new nodes to the front, back or middle, how to print the list, how to search the list, and how to delete nodes. This section provides the basic knowledge you need to understand most other sections on this page. More advanced topics are covered in the following sections: Queues, Stacks, Trees, Graphs etc.

Linked lists are like arrays in many ways, except they have the added advantage of being able to grow and shrink. With arrays you need to declare how many elements it has at compile time (some languages do allow arrays to grow, notably VB). A linked list is dynamic, it grows as each node is added, and shrinks as each node is deleted. It is also much easier to add a node to the middle of a linked list than the middle of an array. 

The following is the declarations needed to create a linked list where each node in the list holds an element called number, and a pointer to the next node called next.

type

dataPtrType = ^dataRecType; {Note that we can declare a type of dataRecType before declaring dataRecType : the declaration is on the lines below.}

dataRecType = record {This is the record that determines the structure of the nodes}

number : integer;

next : dataPtrType; {Next is of dataPtrType ie. it is a pointer to dataRecType - the record type it is declared in.}

end;

var

head, newPtr : dataPtrType; {We now declare two pointers to dataRecType.)

response : integer;

This is the code that creates the list. Note that it has an if-then-else clause that tests to see if the list has already been started. The top half is only executed the first time through the loop (if head = nil).

begin

head := nil; {It is always a good idea to initialize pointers to equal nil before assigning them a memory address.}

writeln('Enter a number');

readln(response);

while response <> 0 do

    begin

        new(newptr); {We create an area in memory for the new node - newPtr points to this area.}

        newPtr^.number := response; {The thing newPtr points to is the record dataRecType - the value number in this record takes the value of response.}

        if head = nil then

            newPtr^.next := nil {This happens the first time through the loop. This node will always be the last in the list (other nodes are created before it rather than after it) so it is very important that the next points to nil, this ensures we know when we have come to the end of the list.}

        else

            newPtr^.next := head; {New nodes are put at the front - then next in them points to the node that previously was the head.}

        head := newPtr; {This new element now becomes the head of the list}

        writeln('Enter a number'); {We then begin the loop again}

        readln(response);

    end;

end.

We can now write a procedure to print out this list.

procedure print(head : dataPtrType); {It receives the pointer to the head of the list}

var

tempPtr : dataPtrType; {It creates its own pointer to dataPtrType so that the value in head can be stored in it.}

begin

if head = nil then

    writeln('Nothing in the List') {This is in case the list is empty.}

else

    begin

        tempPtr := head;

        while tempPtr <> nil do {We move along until we get to the last node which was set to point to nil}

               begin

                    writeln(tempPtr^.number);

                    tempPtr := tempPtr^.next; {This is how you move to the next element in the list.}

              end

    end;

end;

The preceding code has created a linked list and printed it out. There are a number of improvements that can be made to it however. It is more efficient to create a function that initializes the list by creating a dummy head node. This removes the need for the if-then-else clause when we add nodes.

The function looks like this:

function initializelist : dataPtrType;

var

    temp : dataPtrType;

begin

    new(temp);

temp^.number := -10000; {This is a junk value, but it is best to initialize it nonetheless. Make it a value that you would never put into the list normally.}

temp^.next := nil;

initializelist := temp;

end;

At the start of the mainline call the function initializelist like this:

begin

head: = initializelist;

tail := head;

Now nodes can be added to the list without checking is head = nil.

Note that there is also going to be a tail as well as a head. The tail always points to the last node in the list just as the head always points to the first. This will help us add nodes to the end of the list as well as the beginning.

The following procedure adds a node to the end of the list:

procedure addToEnd (resp : integer; var tailPtr : dataPtrType); {The argument list accepts the number the user wants to put into the list, plus a pointer to tail. Note that this is a var parameter, the value of tail will change to the new node we create.}

var 

    temp : dataPtrType;

begin

    new(temp);

temp^.number := resp;

temp^.next := nil; {temp will be the last element in the list, so it must point to nil.}

tailPtr^.next := temp; {The previous tail of the list now points to temp : the new last element in the list.}

tailPtr := temp; {temp becomes the new tail of the list}

end;

We now can add nodes with the following code:

writeln('Enter a number');

readln(response);

while (response > 0) do

    begin

        addToEnd(response, tail);

        write('Enter a number');

        readln(response);

    end;

end.

Another important utility for linked lists is a search procedure. This asks the user for something to search for (a number in this case), and begins a search from the head node. If it finds the node it stops searching and returns a Boolean variable with the value true along with a pointer to the node. If it gets to the end of the list without finding the node it returns false. Naturally, it accepts a pointer to head, but this is under the guise of a pointer named location, because the procedure is going to change the value of this pointer, and we don't want to change the value of head.

procedure search(target : integer; var ptr : dataPtrType; var found : boolean);

begin

    found := false;

    ptr := ptr^.next; {Remember their is a dummy head node that we just skip.}

    while (not found) and (ptr <> nil) do {i.e. while the node hasn't been found, and we haven't reached the end of the list, keep searching.}

        begin

            if ptr^.number = target then

             found := true {It breaks out of the loop at this time, returning the value of true for found, and also a pointer to the node it was found in (remember, ptr was a var parameter.)}

            else

            ptr := ptr^.next;

        end;

end;

This procedure can be called as follows:

var    {These are the variables the mainline needs to call the procedure.}

num : integer;

location : dataPtrType;

found : boolean;

begin

write('Enter a number to search for');

readln(num);

location := head; {We want to start searching from the top of the list. Remember we don't send head directly, because we don't want the value of head altered.}

search(num, location, found);

if found then

    writeln(location^.number); {location was accepted as a var parameter, so if it was found, location is set to point to the node in which it was found.}

else

    writeln('Number not found');

end.

Another important procedure is a delete procedure for removing nodes from linked lists. For this we need to search for the node we want to delete. This is not enough however, we also need a pointer to the node before the node we want to delete because we need to change its next value to point to the pointer after the one to be deleted. Once we got to the pointer to be deleted there would be no way to move back one place, so instead we move through the list with two pointers. Not only do we have a pointer looking for the node to be deleted (deletePtr), we have a pointer always one place behind called trailing. Once the pointer to be deleted is found we set trailing to point to trailing^.next.

procedure delete (headPtr : dataPtrType); {This procedure only accepts a pointer to head in its argument list.}

var

    unwantedNum : integer;

    trailingPtr, deletePtr : dataPtrType;

    found : boolean;

begin

    writeln('Enter the unwanted number');

    readln(unwantedNum);

    found := false;

    trailingPtr := headPtr;

    deletePtr := headPtr^.next; {It doesn't matter that we won't search the head node for the number, because it was a dummy head node.}

    while (not found) and (deletePtr := nil)

        begin

            if (deletePtr^.number = unwantedNum) then

                begin

                    trailPtr^.next := deletePtr^.next; {Nothing now points to deletePtr, and the list is still joined.}

                    dispose(deletePtr);

                    found := true;

                end

            else

                begin

                    trailPtr := deletePtr; {Move both pointers one step along the list}

                    actualPtr := deletePtr^.next;

                end;

            end;

end.

Moving through lists with leading and trailing pointers is also useful if you want to add a node to the middle of a list. This is how you might go about it to build a list where the key value in each node of the list goes from smallest to biggest.

procedure insert (headPtr : dataPtrType);

var

    number : integer;

    leadPtr, trailPtr, newPtr : dataPtrType;

    found, done : boolean;

begin

    write ('Enter a number : ');

    readln(number);

    trailPtr := headPtr;

    leadPtr := headPtr^.next;

    found := false;

    done := false;

    while (not done) and (not found) and (leadPtr <> nil) do

    begin

        if number > leadPtr^.key  then

            begin

                leadPtr := leadPtr^.next;

                trailPtr := trailPtr^.next; {Move one place further on.}

            end

        else if number = leadPtr^.key then

            found := true; {The element already exists.}

        else

            done := true; {We have found the place to insert the element.}

    end;

if found then

    writeln ('The element already exists');

else

    begin

        new(newPtr); {The code for inserting the new node.}

        newPtr^.key := number;

        newPtr^.next := leadPtr;

        trailPtr^.next := newPtr;

    end;

end;

This section has now explained all the basics you need to know to begin working on more advanced data structures and algorithms. These are set out below, though not in as much detail as the information in this section. 

Queues

Like Stacks, Queues limit the way nodes can be entered into and removed from lists. Queues take a first in, last out approach. There are only two basic operations, one inserts a new element into the beginning of the queue, the other removes an item from the end of the queue. Basically, an element isn't allowed out until all those inserted before it have been removed.

The following are the declarations necessary for a linked list based queue. It is a circular linked list with a single external pointer, Q, that points to the end of the list:

type nodePtr = ^ node;

node = record

item : integer;

next : nodePtr;

end;

var

queue = nodePtr;

And these are the procedures and functions it uses:

procedure queueinit (var Q : queue; {For initializing a previously unused queue.}

begin

    Q := nil;

end;

function IsEmpty (Q : queue) : boolean;

begin

    IsEmpty := (Q = nil);

end;

procedure insert(Q : queue; newitem : integer); {For adding a new node with the value k in its key value.}

var

newptr :nodeptr;

begin

    new(newptr);

    newptr^.key := newitem);

    if IsEmpty(Q) then {Uses the procedure above to test if the list is empty.}

        newptr^.next := newptr

    else

    begin

        newptr^.next := Q^.next;

        Q^.next := newptr

    end;

    Q := newptr;

end;

This procedure removes an item and prints out it's value:

Procedure Remove (var Q : queue; var success : boolean);

var

front : nodeptr;

begin

    if IsEmpty(Q) then

        success := false

    else

    begin

        front := Q^.next;

        if front = Q then {If there is one node in the queue.}

            begin

            writeln(front^.item);

            Q := nil;

            end

        else

            Q^.next := front^.next;

            writeln(front^.item);

            dispose(front);

            success := true;

        end

end;

Queues can also be implemented with arrays providing you know roughly how big the list will be at its greatest.

The declaration may look something like this:

const MAX = 100;

var

    queue : array (0 .. MAX] of integer;

    head, tail : integer;

The procedures and functions it uses will then look like this:

procedure queueinit; {Initializes the queue}

begin

    head := 0;

    tail := 0;

end;

procedure put(k : integer);

begin

    queue[tail] := k;

    tail := tail + 1;

if tail > MAX then

    tail := 0; {This allows it to keep going in a loop - just make sure there aren't more than 100 elements in the array at any one time.}

end;

function get : integer;

begin

    get := queue[head];

    head := head + 1;

if head > MAX then

    head := 0;

end;

Stacks

Like Queues, Stacks restrict the way data is accessed. Stacks tend to be more widely used that Queues. Only two operations are involved: one inserts a new element at the top, the other pops an element off the top. There is no way to insert an element at any place but the top, and no way to remove any element other than the top one. It works like a stack of plates, plates are added to the top of the pile and removed from the top of the pile. Consequently, it isn't until all the plates are removed that the oldest plate is removed. This approach is also called last in, first out.

The following are the declarations necessary for all the procedures:

type dataType = ^ dataRecType;

dataRecType = record

key : integer;

next : dataType;

end;

var

head, newptr : dataType;

These are the procedures stacks can use:

procedure stackinit; {For initializing a previously unused stack.}

begin

    new(head);

    new(newptr);

    head^.next := newptr;

    newptr^.next := newptr;

end;

procedure push(k : integer); {For adding a new node with the value k in its key value.}

var

temp : dataPtrType;

begin

    new(temp);

    temp^.key := k;

    temp^.next := head^.next; {There is a dummy head node, so effectively this new element is at the start of the list. Temp points to the node that previously was the effective head.}

    head^.next := temp;

end;

function pop : integer; {Returns the number in the first node (after the dummy node) and deletes that node.}

var

    temp : dataPtrType;

begin

    temp := head^.next;

    pop := temp^.key;

    head^.next := temp^.next; {The dummy head node now points to the node one further down the list.}

    dispose(temp);

end;

Stacks can also be implemented as arrays. This is one way you can declare a stack as a record:

const MAX = 20;

stack = record

    top : 0 .. MAX; {we could make this an integer, but using a sub range allows us to check when we are at the end of the array.}

    items : array [1 .. MAX] of integer;

end;

var

S : stack;

The following are the basic functions and procedures they use:

procedure create (var S : stack);

begin

S.top := 0

end;

function IsEmpty (S : stack) : boolean;

begin

IsEmpty := (S.top = 0);

end;    

We use Push to add a new element. It will not let us do this if the array is full:

procedure Push (var S : stack; newitem : integer; var success : boolean);

begin

    if S.top = MAX then

        success := false

    else

    begin

        S.top := S.top + 1;

        S.Items[S.top] := newitem;

        success := true;

    end

end;

This procedure removes an element and prints it out. It uses IsEmpty (above) to see if the array is empty:

procedure Pop (var S : stack; var success : boolean);

begin

    if IsEmpty(S) then

        success := false;

    else

        begin

            writeln(S.items[S.top]);

            S.top := S.top - 1;

            success := true

        end

end;           

Binary Trees

Binary trees are simple to describe. They consist of nodes, like linked lists, but instead of each node pointing to one other node, each node can point to two other nodes. There is no obligation on a node to point to two other nodes, it can point to nil on either/both sides.

This is an example of how the nodes might be declared:

type

nodeptr = ^node;

node = record

key : integer;

left : nodeptr;

right : nodeptr;

end;

var

T : nodeptr;

Note how similar it is to the declaration of a linked list, the only difference is that it potentially points to two other nodes rather than one.

The following are the main procedures used with trees:

procedure create (var T : nodeptr);

begin

    T := nil

end;

Here is a procedure for inserting a new node into the tree according to its key value. It uses recursion to find the right place to insert the node. This only works if the values of key are unique of course:

procedure insert (var T : nodeptr; num : integer);

begin

if T = nil then

begin

    new(T);

    T^.key := num;

    T^.left := nil;

    T^.right := nil;

end

else if num < T^.key then

    insert(T^.left, num)

else

    insert(T^.right, num);

end;

This procedure searches through the tree for a key value specified in the argument list as num. If it finds it it prints it out and returns success as true, otherwise it returns success as false:

procedure retrieve (T : nodeptr; num : integer; var success boolean);

begin

if T = nil then

success := false

else if num = T^.key then

begin

     writeln('Number found: ', T^.key);

     success := true;

end

else if num < T^.key then

retrieve(T^.left, num, success)

else

retrieve(T^.right, num, success)

end;

 

Tree Traversal

Trees lend themselves to recursion. Nowhere is this more true than when it comes to tree traversal. There are three ways to traverse a tree, these procedures are universally referred to as preorder, postorder and inorder.

Inorder can be used to print the elements out in ascending order:

procedure inorder (T : nodeptr);

begin

if T <> nil then

begin

inorder(T^.left);

writeln(T^.key);

inorder(T^.right);

end

end;

The simplicity of this is great, there is nothing to it. The same is true for the other tree traversing procedures.

Preorder will keep going to the left as far as it can as it descends down a tree, then goes right. To see how it works in practice enter it into your compiler.

procedure preorder (T : nodeptr);

begin

if T <> nil then

begin

writeln(T^.key);

preorder(T^.left);

preorder(T^.right);

end

end;

It doesn't take a genius to guess what postorder will look like:

procedure postorder (T : nodeptr);

begin

if T <> nil then

begin

postorder(T^.left);

postorder(T^.right);

writeln(T^.key);

end

end;

If you don't see what this one is doing, enter a series of elements, draw the tree you have entered down on paper, and compare it to the results postorder gives.

Graphs with Arrays

Graphs consist of a whole lot of elements and the connections between them (actually it is a collection of vertices and edges). Unlike trees and linked lists, graphs allow any element to be linked to any other element. Consider a bus map of America : we have a whole lot of cities and a whole lot or routes connecting them. Basically any city can be connected to any other either directly, or through intermediaries. There is absolutely no restriction on whether some city can be connected to some other city.

The easiest way to create graphs is with two dimensional arrays, Consider an array

cities : array[1 .. 6, 1 .. 6] of integer;

Which might be mapped something like this: the top row are departures, the vertical row are destinations.

  1: Chicago 2: New York 3: LA 4: Miami 5: Roswell 6: Boston
1: Chicago 1 1 0 0 1 1
2: New York 0 1 0 1 1 0
3: LA 1 0 1 0 1 0
4: Miami 0 1 1 1 1 1
5: Roswell 1 1 1 0 1 1
6: Boston 0 0 0 1 1 1

The number 1 represents that there is a bus from one city to the other, 0 means there is no bus. There is a bus from Miami to New York but no bus from Miami to Chicago. Also note that the connections go both ways. There is a bus from New York to Chicago, but none from Chicago to New York.

You could also use boolean variables rather than integers of course, but there is no real difference.

Once you can picture graphs as two dimensional arrays the rest is easy. It is simple to create one and two way connections, to delete connections etc.

There are some other very difficult issues however. Given that there is no direct route from New York to LA, and given that we know the distances from each city to each other city, what is the quickest route to LA using intermediaries? 

Graphs with Linked Lists

Using arrays to represent graphs is fine if you know roughly how many elements the graph contains. Since this is often not the case two way linked lists often need to be used.

Simple Sorting Algorithms

I will begin by describing some of the most simple sorting algorithms. These are quite slow, especially as the size of the data to be sorted increases, but they can be useful if you have a small amount of data, and you only want to run the algorithm a few times. As a rule these algorithms take N squared steps to sort N.  For all the sorts in this section we are putting an array of integers into order.

The first method is called Selection Sort. This is the sorting algorithm most people would come up with on their own if they were asked to sort some data. Basically you run through the whole array, find the smallest element, and swap it with the element in the first position. Then, starting at the second position, you run through the whole array, find the smallest element, and swap it with the element in the second position. This continues until the whole array is sorted. If N is the number of elements in the array, this method takes (N - 1) passes (You don't have to run through the array when only one element is left).

This is the code (N represents the size of the array, A is the name of the array).

procedure selection;

var

i, j, min, temp : integer;

begin

for i := 1 to N - 1 do {The total number of times we need to go through the array.}

    begin

        min := i; {min holds the position of the lowest element in the array - it starts each loop at the element we begin the search from.}

        for j := i + 1 to N do 

            if A[j] < A[min] then 

                min := j;

        temp := a[min]; {This starts the three step process to swap the smallest element found in this pass with the element we are up to in the array.}

        a[min] := a[i];

        a[i] := temp;

    end;

end;

This is really a brute-force approach, but it gets the job done, and is actually reasonably efficient for sorting small files.

The next method to look at is insertion sort. It involves considering the elements one at a time, and inserting each element in its proper place amongst those already sorted. You insert an element by moving larger elements one position to the right.

procedure insertion;

    var 

    i, j, value : integer;

begin

    for i := 2 to N do {This is the number of times the loop below is performed.}

        begin

        value := A[i]; {This is the element we are working with this time through.}

        j := i; {j holds the array position of this element.}

        while A[j - 1] > value do; {We start with the element before value, and head down to the start of the array. If an element is bigger than value, it is moved to the right.}

        begin

            A[j] := A[j - 1];

            j := j - 1;

        end;

        A[j] := value {This value is inserted in the space cleared.}

        end

end;

The Bubble sort is another simple sort. It isn't very good but its got a nice name, so this is what it is:

Keep passing through the file, exchanging adjacent elements if necessary. When no exchanges are needed on an entire pass through the file it is fully sorted.

 procedure bubble;

    var

     i, j, temp : integer;

    begin

    for i := N downto 1 do {The biggest element left in each pass gets put in the right most unsorted position each time, so we can go one position less each time.}

        for j := 2 to i do

            if A[j - 1] > A[j] then

                begin

                temp := a[j - 1];

                A[j - 1] := A[j];

                A[j] := temp

                end

end;

If the biggest element was in possition 1 on the first pass, by the end of the pass it would be in the last position. It would be progressively swapped with every element in the array. As you can guess, this takes a lot of resources, and is very inefficient. Bubble sorts are much better when the file is almost sorted. In fact only one pass is required if the file is already sorted.

Shellsort

Shellsort is like Insertion sort, except it allows exchange of elements that are far apart.

procedure shellsort;

label 0; {goto statements are still useful for breaking out of loops despite what you have been told about them}

var

i, j, h, value : integer;

begin

h := 1;

repeat

    h := (3 * h) + 1 {This can be changed, but this is a good formula.}

until h > N;

repeat

    h := h div 3;

    for i := h + 1 to N do

    begin

        value := A[i];

        j := i;

        while A[j - h] > value do

            begin

            A[j] := A[j - h];

            j := j - h;

            if j <= h then

            goto 0

            end;

0: A[j] := vale

    end

until h = 1

end;

Quick Sort

Quicksort is the most widely used sorting algorithm. Unlike the others listed above, Quicksort is recursive. It works by partitioning a file into two parts, and sorting them independently. This is the basic version:

procedure quicksort (l, r : integer);

var 

i, j, v, t : integer;

begin

if r > l then

begin

    v := A[r];

    i := l - 1;

    j := r;

    repeat

        repeat 

        i := i + 1

        until A[i] >= v;

        repeat

        j := j - 1

        until A[j] <= v;

        t := A[i];

        A[i] := A[r];

        A[r] := t;

    quicksort (l , i - 1);

    quicksort (i + 1, r)

    end

end;

I really hate algorithms like these don't you. I suggest the best way to learn how it works is to sort a file you have written down on paper by going through the loop step by step.

Searching Algorithms

Hashing

Hashing represents a completely different approach to storing data than trees and linked lists. It removes the need to search for a record through using an 'address calculator' that calculates where in an array records with certain key values should be placed. To put it another way, it allows us to directly reference records in a table by doing arithmetic transformations on keys in those records.

Hashing is really pretty simple. The first step is to create the 'address calculator' that will transforms the search key into a table address. In a perfect world all different keys would map to a different address, but this rarely happens, so the second step is working out what to do in these collision situations. This resolution can be achieved with linked lists or arrays. 

This is the pseudo code for inserting an element into the array:

Insert (arrayname, newitem)

Tell the address calculator the search key of newitem

Let i be the location the calculator tells you to use

array[i] := newitem

The address calculator spoken of is usually called a hash function. There are several well known hash functions that have been chosen because of their ability to evenly distribute elements amongst the available memory addresses.

I will first describe some hash functions, and then look at the issue of collision resolution.

Hash functions often depend on what the key values look like. One approach is selecting digits:

Suppose we have an array of 100 spaces we need to map keys to. If you have a long employee number like 875 383 288 you may decide to use a combination of the third and last numbers, to give you the value of 58. 

A problem with this approach is that it isn't very good at evenly distributing the key values across the array.

Another variation on this approach is to add all the numbers together, this is called folding:

8 + 7 + 5 + 3 + 8 + 3 + 2 + 8 + 8 = 52

I don't like this approach, the numbers will seldom come out very low, i.e. there are only 9 ways to get the value 1, but there are scores of ways to get a number like 52.

Another good approach is modular aritmetic:

table possition := keynumber mod size of array

875383288 mod 100 naturally equals 88 - the last two digits. 100 is a very bad size, a much better number is a prime number like 101 - this gives 27.

Modular functions where the table size is a prime number probably represent the best approach. The calculation is quick to perform, and gives a good spread over the array.

We now come to the problem of collisions. 875383288 isn't the only number that will map to position 27. 

One approach to a collision is to remap the number. If its first choice position is taken look for another, and another until one is empty. One approach is called linear probing : if the position you want is taken try the next one along, and then the one after that. Under this approach, if you get to the end of the array you should go back to the beginning.

The biggest problem with this approach is that as the array fills up, it begins to take longer and longer to find an empty position. Before long you have lost the benefits of hashing and are, in fact, implementing a search - the very thing hashing is meant to avoid.

 


Wednesday, February 23, 2000