PERL


Introduction

Getting Started

The Basics

Arrays

Associative Arrays

File Handling

Control Structures - Iteration

Control Structures - Selection

String Matching

String Substitution

Splitting a String

Functions

Regular Expressions

Introduction

Perl is a UNIX based language that has only recently become available to Win-32 users, and in fact has been ported to almost every OS. It was originally developed for manipulating files and strings in UNIX, but is now widely associated with the internet, especially CGI scripting. Perl is probably the most valuable scripting language you can know. There are more good jobs out there requiring a knowledge of Perl than any other scripting language (Although there are still far more jobs for C++ and JAVA programmers than for Perl programmers).

Getting Started

I use Perl both on Linux and Windows. Perl comes with all versions of Unix (including Linux) and should be located in /usr/bin/perl.

With Linux you can use emacs or vi to write Perl scripts. They can be executed by typing :

perl scriptname

Alternatively, if you make

#!/usr/bin/perl 

the first line in your script, and change the permissions so you can execute the script, you can just type the full path of the script and it will run.

Alternatively you can enter lines directly into Perl from the shell:

perl -e 'print "Hello World!\n";'

Perl has only become available on Windows relatively recently.

You will firstly need to download Perl from Active State (5MB). Also note that if you are running Win95 you will need to go to Microsoft and download DCOM (1.3MB)(there is a link on the Active State site).

Active Perl is easy to install, I would suggest using the default directory (C:\Perl). There is a lot of documentation that comes with the application.

You will also need a text editor. I am just using Notepad.

To execute a program enter the code into the editor, e.g..

print "Hello World";

Save this to disk with a .pl extension e.g..

a:hello.pl

There are now two ways to run this application :

1. Open up a DOS shell. If the default directory is not C:\> change it with cd.. At the prompt type perl a:hello.pl. (If Perl is not located in the top level of C:, go to the appropriate directory.) This should print out Hello World".

2. The other way to run the program is to double click on the file. This should open the PERL window and run the program.

The best site of the web for Perl related material is Perl.com

The Basics

We have already seen the Hello World program :

print "Hello World";

so we can begin with variables.

Like most scripting languages, Perl is very loosely typed. There are basically three data types - scalar variables (which hold one thing - covered in this section), arrays and associative arrays (or hashes). 

Scalar Variables are non-typed. You can store numbers and strings in the same variable. 

$var1;

$var1 = 100;

print $var1;

$var1 = 'Dane';

print $var1;

The $ is not optional.

You could also use a declaration like :

$var1 = '100';

and then still use the number in arithmetic.

You can also set the value of a variable to the value of an expression :

$var1 = 9 * 9;

$var1 = 8/3;

The operators are fairly standard - % is used for modulus and ** is used to raise to a power (eg. 2 ** 3).

You can also use increment and decrement operators like in C++ :

$var1++;

$var1--;

++$var1;

 The same rules apply as in C++.

Perl also allows you to use some simple operators on strings. This performs concatenation :

$str1 = 'Hello';

$str2 = ' World';

$str3 = $str1 . $str2;

print $str3;

You also could have used a line like this :

$str3 = $str1. ' and ' .$str2;

to print out Hello and World

but this can be done easier by using double quotes :

print "$str1 and $str2";

Like C++, you can use \t and \n for tabs and new lines.

You can use {} around scalars if you need :

$number = 5;

print "It is the ${number}th number.";

Arrays

Here is a basic example of an array :

@cars = ("Honda", "Mazda", "Ford", "BMW");

The elements are indexed starting at 0, so to print out Ford we enter

print $cars[2];

The value we want to print out is scalar so we use $ instead of @.

An array can also be an element of another array. Consider this :

@morecars = (@cars, "Audi", "Opel", "Astin Martin", "Nissan");

print $morecars[2]; will still print Ford, while 

print $morecars[5]; will print Opel.

Arrays in most other languages can't do this.

We can also add an item to an existing array like this :

push(@morecars, "Toyota", "Mitzubishi");

To remove the last item from an array use pop.

pop(@morecars); 

or if you want to store the popped item in a variable :

$var2 = pop(@morecars); 

If you want to find out the length of a variable use:

$length = @morecars;

Or you can create a string containing all the elements :

$string = "@morecars";

To find the index value of the last item use :

$#morecars;

Associative Arrays

Most languages don't have associative arrays built in, while a few (Tcl/Tk for instance) only have associative arrays. They don't work by index, instead you create a key value with each element by which you can access the array. The great advantage of these arrays is that they are expandable. They are essencially the same as hash tables.

This array stores the highest monthly temperatures of a few months :

%hightemp = ("January", 56,

                         "February", 67,

                        "May", 89,

                        "August", 98);

Now if we want to print out the value for "May" we enter :

print $hightemp{"May"};

Note that the key value is in curly brackets, this is stupid, but what can you do? Also note that we access the array with a $ rather than a %.

An index array can be converted to an associative array like this :

@indexarray = %assocarray;         

This can also be done the other way.

Associative arrays aren't quite as easy to print out - this is how you print out the keys :

foreach $temp (keys %hightemp)

{

print "The key was $temp\n";

}

and here is how you print out the values of those keys :

foreach $temp (values %hightemp)

{

print "The temp was $temp\n";

}

The elements are not returned in the order they are entered - to be honest I'm not sure what determines the order.

If you want to print out the key values and their values use this loop :

while (($temp, $hightemp) = each(%hightemp))

{

print "$temp had $hightemp as highest\n";

}

File Handling

Working with files is easy in Perl. I have created a text file called first on my a: drive to demonstrate this :

$file = 'a:first.txt'; 
open(INFO, $file);
@lines = <INFO>;
close(INFO);
print @lines; 

In the first line I place the path of the file into a string. The second line then opens this file - INFO is the file handle that Perl uses to refer to the file from here on. In the third line the contents of the file are read into an array called @lines. The file is then closed, and the array is printed out.

If we wanted to open the file for output we would use :

open(INFO, ">$file");  

or we could open it for appending :

open(INFO, ">>$file"); 

A file can also be opened for input/output :

open(INFO, "<$file");  

To print something to a file you have opened use print with an extra parameter :

print INFO "This line goes to the file.\n";

If you are opening a file, you should also provide an error message if the filename isn't correct. This example stops execution and prints an error message if the file doesn't exist :

open(info, "file1") or die "Can't open the file :  $!\n";

The $! variable contains the error message returned by the operating system.

Control Structures - Iteration

Most loops require a test to see whether a value is true to determine when to quit a loop. These are the structures needed for that in Perl.

$a == $b    : $a is equal to $b (use for numbers).

$a != $b     : $a is not equal to $b (use for numbers).          

$a eq $b     : $a is equal to $b (use for strings).

 $a ne $b    : $a is not equal to $b (use for strings).                    

You can also use logical operators :

($a && $b)    : is $a and $b true.

 ($a || $b)         : $a or $b are true.            

!($a)                : $a is not true.  

The basic format of the for structure is this:

for (initialise; test; inc)

{

         first_action;

         second_action;

         etc

}

For instance :

for ($i = 0; $i < 10; $i++)    

{

    print $i;

}

Here is an example of the while loop :

$i = 1;
while ($i < 10)
{
        print $i;
        $i++;
}

This will print out 123456789.

Here is a more complex example that asks you to enter a password until you get it right :

print "Password? ";             # Ask for input

$a = <STDIN>;                   # Get input

chop $a;                        # Remove the newline at end

while ($a ne "jerry")            # While input is wrong...

{

    print "sorry. Again? ";     # Ask again

    $a = <STDIN>;               # Get input again

    chop $a;                    # Chop off newline again

}

print "Correct";

This example also demonstrates how to obtain user input. Note that the input will include a newline character which will distort what the user meant, so we remove it with chop.

Like most languages there is also a do...while loop. This is different because the condition comes at the end, so the loop always executes once. Here is an example with the password program :

do

{
print "Password? "; 

$a = <STDIN>; 

chop $a; 
}

while ($a ne "jerry");

print "That's right";

 

Control Structures - Selection

The other important way you control the flow of a program is selection - the if-then-else structure common to most languages.

Here is an if-else statement :

$a = 200;

if ($a > 100)

{

print "Greater than 100\n";

}

else

{
print "Less than 100\n";

}

This will printout Greater than 100.

Here is a more complex option that allows us to choose from several options :

$a = 130;

if ($a > 200) 

{

print "Greater than 200\n";

}
elsif ($a > 120) 

{

print "Greater than 120\n";

}
elsif ($a > 80) 

{
print "Greater than 80\n";

}

else 

{

print "Less than 80\n";
}

The first true statement will cause its block to be executed. Even if several statements are true, only the first true statement has any effect : this one prints out Greater than 120 - it doesn't also print out Greater than 80 because that is true as well.

Remember not to put the e in elsif, and don't ask me why not.

String Matching

String manipulation was originally at the heart of Perl, and it is still one of its most useful features.

Given the string :

$sentence = "I went to the shop";

The following statement is true :

$sentence =~ /the/;

Anything in / / is a regular expression. The =~ operator is asking if the regular expression is inside $sentence. Note that this is case sensitive. /The/ is not in $sentence.

To see if something isn't in a sentence use !~ :

$sentence !~ /the/;

There are a lot of special characters that can be used here :

To test if a certain regular expression is at the start of a string use the following :

$sentence =~ /^I wen/

To test for the end of a string use this :

$sentence =~ /op$/

To test if it is a string without a new line use /*/.

To find out if it is a string with nothing in it use /^$/.

You can also use [ ] t match a character in the line against several possibilities :

$sentence =~ /[gwv][sre][nmb]/

This would we true because wen matches went in $sentence.

You could also use something like [^jkl] if that letter wasn't allowed to be j, k or l. You can also use subranges like [a-z] or even [a-zA-Z]. This just follows ASCII ordering.

With the growing number of special characters appearing, it is a good thing to know that if you need to use any of these characters in their traditional sense, put a backslash in front of them - \$.

String Substitution

Given our variable $sentence = "I went to the shop"; we can change shop to movies like this

$sentence =~ s/shop/movies/;

This only works for the first substitution - to make it global use :

$sentence =~ s/shop/movies/g;

You can also use the [] like above, so if you want to globally substitute Shop and shop use :

$sentence =~ s/[Ss]hop/movies/g;

Transition is also possible. For instance :

$sentence =~ tr/swe/edf/

This will change every s to a e, every w to a d, and every e to an f.

It also returns the number of swaps made, so

$count = ($sentence =~ tr/d/d/);

will count the number of d's in $sentence.

An even cleverer use is this :

$sentence =~ tr/a-z/A-Z/;

which changes

Splitting a String

The function split splits up a string and puts it into an array.

$info = "Cameron:Dane:Student:New Zealand";

@person = split(/:/, $info);

which has the same overall effect as

@person = ("Cameron", "Dane", "Student", "New Zealand");

Note, that we specify the character that signifies the split. If there might have been more than one colon we could use : split(/:+/, $info);

To split a word into characters use :

@chars = split(//, $word);

To split a sentence into words use :

@words = split(/ /, $sentence);

To split a paragraph into sentences use :

@sentences = split(/\./, $paragraph);

Functions

Perl allows you to create and use your own functions. This is the basic form :

sub myfunction

{

        print "Not a very interesting function\n";

        print "This does the same thing every time\n";

}

You can then call this function by typing

&myfunction;

You can also pass parameters to a function - take a look at this example :

sub printargs

{

print "$_[0]";

}

printargs(90);

It accepts one argument (90) and prints it out. Inside the function it reads this as $_[0] - basically the first variable in a list.

You can also pass arrays :

sub printargs

{

        print "@_\n";

}

&printargs("perly", "king");    # Example prints "perly king"

&printargs("frog", "and", "toad"); # Prints "frog and toad"

It prints out the array it was called with.

Here is an example of passing two variables :

sub printfirsttwo

{

print "Your first argument was $_[0]\n";

print "and $_[1] was your second\n";

}

printfirsttwo(10, 20);

You can see where this is heading. The first argument becomes $_[0], the second one $_[1] and so on.

The manner of picking up variables in the functions is completely different from every other language I have used, and takes a bit of getting used to.

Functions can also return values, but this is done differently as well - this function accepts two numbers and returns the bigger one :

sub maximum
{
        if ($_[0] > $_[1])
    {
        $_[0];
    }
        else
    {
        $_[1];
    }
}

$biggest = &maximum(37, 24); # Now $biggest is 37

print $biggest;

The result of a function is always the last thing evaluated.

You can also create local variables :

sub inside

{

local($a, $b); # Make local variables

($a, $b) = ($_[0], $_[1]); # Assign values

$a =~ s/ //g; # Strip spaces from

$b =~ s/ //g; # local variables

($a =~ /$b/ || $b =~ /$a/); # Is $b inside $a

# or $a inside $b?

}

&inside("lemon", "money"); # true

Regular Expressions

Regular Expressions have been used in several string functions already. This is some additional information not fully covered there. They are basically a way of describing a set of strings without having to list all the strings in the set. RE's are used in many Unix programs like grep, awk and sed.

RE's are used in several ways in Perl. Primarily they are used in conditionals to determine whether a string matches a particular pattern - they look like /pattern/

They can also be used to swap one pattern for another s/originalpattern/changeto/.

Here is an example how they can be used to search a file for a name :

while ($line = <FILE> {

   if ($line =~ /Dane/) {

        print $line;

   }

}

Suppose we wanted to list all words followed by an exclamation mark. We could use :

/[a-zA-Z]+!/

There are also some shortcut characters for matching patterns :

Name Definition Representation
Whitespace [\t\n\r\f] \s
Character [a-zA-Z_0-9 \w
Number [0-9] \d
Anything   .

These all match single characters.

Suppose you are looking for phone numbers in a phone book. You know that all numbers in this area have between 7 and 9 numbers, so you only want results that reflect that. You can specify a lower and upper limit like this :

/\d{7, 9}/

Or, is all numbers had five digits you could use

/\d{5}/