#!/usr/bin/perl ########################################################################### ################ Professor Liang's Perl Tutorial In Perl ################# ########################################################################### # This document is meant for relatively advanced computer science students # to quickly begin using Perl. Our purpose is to study the characteristics # of the Perl language in comparison with other languages. The reader is # urged to consult what I consider to be the standard reference, # "Programming Perl" by Larry Wall, et. al., as well as other online # tutorials and references to learn all the capabilities and uses of the # language. Perl is available from www.perl.org. # You should run the code in this tutorial with "perl perltutorial.txt" # so you can see what exactly each code fragment is doing. Go ahead and # experiment by making changes. ######## I. Strings and More Strings print "2"*3; # In any other language, you would think me crazy: how can you multiply a # string by an integer? Surely this is a mistake only a beginner will make. # However, in Perl, you will in fact see 6! How? Because strings are the # basic data type in Perl. Most everything is treated as strings, including # the unquoted 3. Perl is best in practice for text processing, such as # with the html source of web pages. It is actually quite awful for # manipulating binary data (C would be best for that purpose). Strings # therefore have a special status in Perl. ######## II. Scalar Variables and Static Scope # Perhaps one reason for the popularity of Perl is all those dollar signs! # Most variables (except for file handles) in Perl require a special symbol # in front to designate its type. The most common symbol is "$". # The $ symbol in Perl signifies the presence of a scalar value. Scalar # values include numerical values, strings, and perhaps most importantly, # pointers (memory addrs). All variables containing scalar values must be # prefixed with $. This is a characteristic that Perl inherited from Unix # scripting languages, though it has transcended that simple role long ago. # If you are familiar with Java, scalars correspond to fixed, primitive # types (plus strings). $x = 2; # assigns 2 to global scalar variable $x { my $x = 3; # assigns 3 to a scoped, "local" variable print "\nmy x is $x\n"; # prints local variable 3 } print "...and mine is $x\n"; # prints global variable 2 # As the above program segment indicates: # 1. Variables do not have to be declared as in "int x;" before being used # 2. "my" introduces a lexically scoped variable whose scope is defined # by the enclosing {}s. It is a bit ironic to call it a "local" variable # because "local" is a keyword used for something else. # 3. When a scalar var such as $x is placed inside string ""s, Perl # will in fact expand its value. To prevent this from happening, use print 'x is still $x inside single quoted strings!', "\n"; # 4. There's no "main" in Perl. It's run as a script. # If you're new to Perl, it's natural to forget the $ before variables. I # still do sometimes. But they are necessary. ######## III. Booleans and if-else # Like C (and unlike Java), Perl has no special type for booleans. The # null pointer, 0, "", or even "0" all represent false. Everything else # represents true. You'll sometimes see Perl code such as if ($val) { print $val; } else { print "\$var is undefined\n"; } # $val returns false if it was not defined. # It is important that, in the if-else statement, {}'s enclose the # two cases. They are not optional. Why? Think you know C++/Java? # What does the following code print? # if (2<1) # if (1<2) cout << "me"; # else cout << "no, me"; # It won't print anything, but you'll have to know that the "else" is by # convention associated with the closest (innermost) "if", otherwise you # may get confused. This is the classic "dangling else" problem. The # required {}'s of Perl help to eliminate this confusion. ######## IV. Arrays and Hash Tables # In addition to scalar values, the @ symbol is used to prefix arrays, and # the % symbol prefixes hash arrays. Perl arrays are not really arrays in the # sense of C or even Java in that they don't necessarily represent a # fixed segment of memory. In fact, arrays in Perl are more appropriately # called linked lists in that they can be expanded and shrunk. my @l = (3,5,7); # declare an array or list of three integer elements @l = (2,@l); # adds an element in front of l push(@l,4); # adds element (4) destructively onto right end of list pop(@l); # deletes rightmost element and returns it. print "@l\n"; # you don't have to write a loop to print an array # Why are these things called arrays and not just lists? Because Perl # gives the user the convenient syntax of accessing list elements using # the familiar bracket notation: $l[2] += 4; # increments the third element of the array by four. # NOT FAIR! If l is an array/list then why did we still put $ in front of it? # This is a point of contention in the Perl community, and may change in # the future. The reason for the $ is that, although l is an array, l[2] # is an integer, which is a scalar. That's just the way things are now # with Perl. If the value of l[2] is also an array, we would use @l[2]. # Here's how you use a for loop to print an array backwards for(my $i=$#l; $i>=0; $i--) { print $l[$i], " "; } print "\n"; # The expression $#l is the last valid index of l, or the length of l minus # one. Note that $i is local (by virtue of "my") within the loop. An # alternative is just to say my $i = @l-1; Perl will automatically infer # from the given context of assigning an array to a scalar that by @l you # really mean the length of the list. ## Hash tables are also very basic data structures in Perl. Here's a table # one might use to store those Hofstra student id's: my %id; # declares hash table $id{"larry"} = 700123456; $id{"mary"} = 700654321; # etc ... print "mary's id is ", $id{"mary"}, "\n"; # The % symbol prefixes the hash table, while the {}'s (as opposed to []'s) # signify that you're accessing a hash table instead of an ordinary array. # The function "keys" returns a list containing all the keys of a hash table: print "here are my keys: ", keys(%id), "\n"; print "they look better separated by a comma: ", join(", ", keys(%id)), "\n"; # The "join" built-in function separates the elements of a list using a # given string (in this case ", "). It's commonly used for formatting # output. ##### The _ variable # Perl has a special variable "_" which basically represents "whatever # is most relevant in the current context." For example, inside a # procedure, it represents the list (array) of parameters passed to # the procedure. # try this: foreach (keys(%id)) { print $id{$_}, "--"; } # the foreach loop goes through every element of a list, and inside the # body of the loop $_ refers to the current value of the list being examined. # The above foreach loop can also be written by associating a variable # with each element, as in foreach $x (keys(%id)) { print $id{$x}, "--"; }. ######## V. Subroutines: Lambda Terms by Another Name # Here's the non tail-recursive ("naive") fibonacci function: sub fib1 { my $n = $_[0]; if ($n<2) {1} else {fib1($n-1) + fib1($n-2)} } # Several things are important to point out. The parameters of the subroutine # "fib1" are contained in the implicit array "_". Thus $_[0] is the # first argument, and $_[1] would be the second, and so on. Secondly, # the "return" keyword is optional in perl: whatever is the last # expression evaluated determines the value returned by the function. # You might be wondering: why did I have to declare a local variable $n? # Can't I just use $_[0] throughout? Well, for this function it doesn't # matter, but Perl passes variables to a function in a different way than # what you might expect: sub swap { my $temp; $temp = $_[0]; $_[0] = $_[1]; $_[1] = $temp # The semicolon is optional on the last line in {}'s } $x = 2; $y = 3; swap($x,$y); print "\nthe values of \$x and \$y are now $x and $y: they got swapped!\n"; # I use the swap function to remind people that, conventionally, a function's # parameters are local variables within the function. The swap function # wouldn't swap anything in Java, but the Perl program above does! By # default, Perl passes parameters by REFERENCE. If you know C++, it's the # same as saying # void swap(int& x, int& y) # That is, whatever you do to a parameter variable WILL be persistent # even after the function exits. This may be a desirable behavior, such # as with the swap function above. However, in general, the standard # call-by-value method is recommended. By assigning $_[0] to a locally # declared variable (via "my $n=$_[0]"), I am making the function # behave in the "conventional" way. That is, as a self-contained # programmatic unit. Whatever you do to $n will be local within the function. # Here's a nice feature of Perl. I don't have to declare variables one # at a time: my ($x,$y); # declares two variables $x and $y at once using a list. # Similarly, here's an easier way to swap two values: ($x,$y) = ($y,$x); # Here's the tail-recursive (iterative) fibonacci function: sub fib2 { my ($n,$a,$b) = @_; # localize all arguments if ($n<2) {$b} else {fib2($n-1,$b,$a+$b)} } print "\nthe 100th fibonacci number is ", fib2(100,1,1), ".\n"; # The naive fibonacci function will give you the same answer, but # you'll have to wait around 20,000 years to see it. # print fib1(100); #uncomment at your own risk # At the end of this tutorial we will use Perl's extraordinary power # to make the naive fibonacci function almost as fast as the tail-recursive # version. # You might be wondering: what if I only passed one or two arguments to # a function like fib2, which expects 3? The answer is that the result # becomes unpredictable. This is a contrast between strongly typed # (Java) languages and weakly typed ones (Perl, Scheme). You can expect # less errors to be caught at compile-time with Perl. That's the price # you pay for the dexterity of weakly typed languages. # Just to be complete, here's "fib3", which uses a while loop (happy now?) sub fib3 { my $n = shift; # alternative to my $n = $_[0]; my ($a,$b) = (1,1); # initial values for $a and $b while ($n>1) { ($a,$b) = ($b,$a+$b); $n--; } $b # the ; is optional for the last line inside {}'s } # Perl's design philosophy is to give programmers a variety of styles to # choose from. For example, print "1<2\n" unless (1>2); # is the same as if (!(1>2)) {print "1<2\n"}. # Perl is for experienced programmers who love programming. Beginners # should stay away from Perl as they would end up using it in only # uninteresting ways and develop lots of bad habits. # Finally, you'll see function application sometimes written as # &fib3(100). & is the symbol that prefixes function variables just # as $, @ and % prefixes scalars, arrays and hash tables respectively. # You may also see (fib3 100) sometimes, which is application in # the lambda calculus/scheme style. ######## VI. Pointers # Pointers (aka references or memory addresses) are an important datatype # in Perl. For Java programmers who are not familiar with the generic # use of pointers, this section may seem a bit difficult. It is possible, # however, to avoid trouble with pointers by using them in a uniform way, # just as in Java. The next section, on pointers to functions, will adopt # this approach. my $x = 3; my $z = \$x; # sets $z to point to $x. "\" works like "&" in C. $$z += 1; # to buy back the value from the pointer, you need two dollars :-) print "the value that $z points to is ", $$z, "\n"; # You can also have pointers to complex structures: my $x = \%id; # points x to the id hash array we used earlier. print "$x points to ", %$x, "\n"; # Note that $, not %, still prefixes x. The pointer itself is a scalar, # that is, a 32 bit memory address. To dereference a pointer back to its # value, as the above examples indicate, we use another $, %, or @ infront, # depending on the type of the item being pointed to. # To illustrate when pointers are needed, let's first look at a function that # does NOT require them. The following function returns the index of an # element $x inside a list @L, returning -1 if it doesn' exist: sub indexof { my ($x,@L) = @_; # returns position of x inside L my $i = 0; while (($i <= $#L) && ($x != $L[$i])) {$i++;} if ($i<=$#L) {$i} else {-1} # return -1 if $x not found in list. } # indexof(3,(4,3,6,8,7)) will return 1, the index of the "3" inside the list. # This function did not need pointers because Perl nicely separates the # head (or "car") of the list from the rest ("cdr") of the list in the way # you'd expect. However, sometimes you may want to pass in something else # AFTER the list, or pass two distinct lists to a function. The next # function returns the intersection of two lists. Note the use of pointers, # and deduce for yourself why they're needed. sub intersection { my ($A,$B) = @_; # assigns two POINTERS to the args my @I = (); # intersection list to be constructed, initially null foreach my $x (@$A) # for each element x in A, { foreach my $y (@$B) # check if it's also in B { if ($x == $y) { @I=($x,@I) } # add x to I list (can also use push) } # inner loop } # outer loop @I; # return the I list that was built. } my @l = (1,3,4,7,2,8); my @m = (3,9,6,4,1); print "the intersection of @l and @m is ", intersection(\@l, \@m), "\n"; # Look at the code carefully to see where pointers are making a difference: # For example, @$B retrieves the list from the pointer $B. Also, # $$B[$i] is required to access the (scalar) values of the list. # In order to pass a complex structure to a function, in general you'll # have to use pointers. In the above function, at least the first list # had to be passed in as a pointer. # Here's another way to have hash tables, using pointers: $myhash->{"key1"} = "value1"; $myhash->{"key2"} = "value2"; # Perl infers from {} and -> that $myhash is a pointer to a hash table. # C/C++ programmers should know that "A->B" is really "(*A).B". That is, # it dereferences the pointer A and at the same time retrieves the field # B from the dereferenced struct/object. Perl expands this meaning of "->" # to the case of arrays, hash tables (and as you will see in the next # section, even functions). For arrays you can similarly have $A->[0] = 1; $A->[1] = 2; print "referenced array: ", @$A, "\n"; # prints contents of array # Just as in C/C++, one way to avoid confusion with pointers is to use them # in a CONSISTENT manner In fact, this observation led to the uniform # treatment of pointers in the Java language. That is, if you adopt the # policy to: # 1. Never use pointers to scalar values # 2. Always use pointers to complex structures # then you'll be emulating the approach of Java (except for strings). ######## VII. Pointers to Functions # Now we finally get to what I consider to be the funnest part of Perl: # its ability to be used as a fully general, higher-order language that's # (nearly) as expressive as Church's lambda calculus. A function (or # "subroutine") in Perl can be used like any other value. It can be passed # to another function, returned by a function, and assigned to a variable. # Here's the (naive) fibonacci function expressed as a Perl lambda term # assigned to a variable: $fib = sub { my $n=shift; # same as my $n = $_[0]; if ($n<2) {1} else {$fib->($n-1) + $fib->($n-2)} }; # Note the ";" at the end, since this is just an assignment statement! # No name follows "sub" - it's just "lambda" for Perl. To apply the # function pointed to by $fib, we use $fib->(args). So now you see: # $A->[$i] accesses the array pointed to by $A at index $i # $A->{$i} accesses the hash array pointed to by $A for key $i # $A->($i) accesses the function pointed to by $A and applies it to $i # A characteristic of a well-designed language is generality. Once you # get used to all the $#%@ (not an explicative) you'll see that most # everything in Perl simply MAKES SENSE. # Having said that however, I should point out one subtlety: the # definition of $fib wouldn't have worked if I had used my $fib = ...; # because the recursive calls to $fib would refer to something not defined # yet. "my" in Perl corresponds to "let" in functional languages # such as Scheme. However, Scheme contains another construct "letrec" that # allows one to bind recursive definitions. Perl lacks this construct, but # fortunately it's not a big deal. To bind $fib to a local var, simply # declare it first on a separate line: # my $fib; # $fib = sub { ... }; # When we assign a function to a local variable inside a function, # we effectively get a locally defined function. The following tail-recursive # fibonacci function uses this ability to hide the fact that you need to # initially pass two additional values (1's) to the recursive function: $tfib = sub { my $f; # local recursive function $f = sub { my ($n,$a,$b) = @_; if ($n<2) {$b} else {$f->($n-1,$b,$a+$b)} }; $f->($_[0], 1, 1); # call internal function }; # Now to call $tfib, we can just say $tfib->(10), without having to pass in # the two 1's. # Defining local functions, in addition to hiding implementation detail, # can in fact also give us a form of object-orientation. However, I will # leave that discussion out of this tutorial. You may consult my # document "Bank Accounts in Perl" to see how this is done. ######## VIII. Higher Order Functions # Being able to pass a function as an argument to another function can # be a very useful feature. In fact, graphical user interface API's # commonly rely on them in defining "callback" functions that handle # asynchronous events. The following function is a classic: it applies # a given function to a list of values: sub mapfun { my ($f,@L) = @_; # separate car,cdr of @_ into function and list my @M =(); # new list to be built foreach my $x (@L) { push( @M, $f->($x) ); } @M # return new list } $f = sub { 2**$_[0]; }; # function to return 2 to nth power (** = Math.pow) @powers = mapfun($f,(1,2,3,4,5,6,7,8,9,10)); print "this is how computer people count: ", join(" ",@powers), "\n"; # It's also possible to inline a function when passing it, without defining # it first: @squares = mapfun(sub{$_[0]*$_[0]}, (1,2,3,4,5)); print "squares: @squares \n"; # A function can also return a function. The following example composes # two functions that are passed in as arguments: sub compose { my ($f,$g) = @_; # parameters are functions $f and $g sub { $f->($g->(@_)); } # fog(x) = f(g(x)) } # compose returns a function that applies g, then f to its arguments. $f = sub { $_[0] * $_[0] }; # lambda x. x*x $g = sub { $_[0] + 1 }; # lambda x. x+1 $fog = compose($f,$g); print "applying a dynamically generated function: ", $fog->(4), "\n"; #### Automatic Memory Management. # note that: @l = mapfun($f,@l); # will effectively replace @l with a new list, namely the list built # by mapfun. If you're a C/C++ programmer, you may be wondering # what happened to the original @l list - doesn't it need to be deallocated? # The answer is that like Java, Perl is a modern programming language that # does automatic memory management or "garbage collection". Scheme was the # language used to develop this important technology. Far from just a # convenience, memory management frees the programmer to think at a higher # level, and gives rise to a style of programming previously considered # impractical. # To (temporarily) bring an end to this tutorial, I will now write a function # that can optimize the performance of a function passed to it. The # idea is to avoid redundant computation by storing the results of function # calls in a hash table. Then, the next time the function is called on the # same arguments, the hash table is first checked to see if a result already # exists. It's important to point out that this technique only works for # a certain kind of functions: it doesn't work for functions that change # some external state, e.g, it won't work for any "void" functions. But it # works wonderfully on recursive functions such as the naive fibonacci # function, as it would eliminate all redundant recursive calls. The # function takes a function as an argument and returns an optimized version # of it: sub makehashfun { my $f = shift; # function to be optimized my $hash; # local hash table to store results sub { # new version of function my @args = @_; my $jargs = join ",",@args; # join multiple args into hash key my $val = $hash->{$jargs}; # look up hash table if ($val) {$val;} # if value exists, we're done! else { # need to call function $val = $f->(@args); # calls function $hash->{$jargs} = $val; # store result in hash table $val; # return value } # else } # returned subroutine of makehashfun } # makehashfun $fib = makehashfun($fib); # optimizes naive fibonacci function (see Sec. VII) # comment out the above line at your own risk! print "Now you won't have to wait 20,000 years to see ", $fib->(100), "\n"; # The function returned by makehashfun is called a "closure". In addition # to being a lambda term, it also carries with it an "environment", namely # its hashtable. The hash table is "stateful" - that is, it retains its # values between separate calls to the function. In this sense, the # returned subroutine behaves more like a "method" in an oop language than # a pure "function". This topic, however, is out of the scope of this # "kick start" tutorial. # Having seen higher-order functions, you are now ready to read my # document "Lambda Calculus in Perl" for the greatest spiritual journey # in computer science. ######################################################################### # There's a lot more to talk about in Perl. As my purpose here is to # introduce the essential characteristics of the programming language, # I've not touched on some features that make Perl so popular in # practice, such as its I/O model and its facility with regular # expressions and parsing. I've also not touched on the recently-added # support for a rudimentary form of object-orientation (Perl packages). # A large number of ready-made Perl modules are available from www.cpan.org. # You may find these topics in other Perl references, or stay tuned for # a future, expanded edition of this tutorial. # In addition, I also have a number of programs that further illustrate # the uses and characteristics of Perl. You should consult the following # files on my programming languages class homepage: # lambdaperl2.txt : Lambda Calculus in Perl # dynamic.pl: Explains the use of "local" in contrast to "my". # perlbank.txt: Uses closures, alluded to above, for a style of OOP # blessed.txt : Uses the new Perl Package-based OOP # webclient.pl : TCP client to download html pages from a web server # byteordering.pl : Binary data manipulation in Perl (not for the weak) # submitprog2.txt : CGI form for uploading your assignments to my web server print "\n ... Stay tuned for more ...\n";