A COURSE ON TCP/IP NETWORKING WITH LINUX Chuck Liang Computer Science Department Trinity College 300 Summit St., Hartford, CT 06106 Email: chuck.liang@trincoll.edu ABSTRACT This paper describes a course on TCP/IP networking where students were required to administer their own network servers using the Linux operating system. We discuss the resources and tools that Linux offers for the illustration of networking concepts and the implementation of network services such as routing and firewalls. Sources of reference and selected assignments are described. We also address issues of teaching such a course in a small college environment. INTRODUCTION It is undeniable that computer networking has emerged as an important addition to traditional research and practices. Appropriate additions to the computer science curriculum are needed to reflect this exciting new direction. Due to the inherent differences between networking and traditional topics, however, new approaches are needed for their integration. The vast range of subjects within computer networking also makes choosing the appropriate set of topics for a course difficult. Large universities may have the resources to teach several networks related courses, addressing the different "layers" of networking protocols separately. In a small college environment with limited faculty resources, however, offering more than one upper-division course can be impractical. In choosing the topics for a new networking course at Trinity College, it was decided to restrict our focus to topics surrounding the "TCP/IP" model of networking. TCP/IP, more than acronyms for two protocols, represents a model of abstraction, by which different networking issues can be considered relatively independently. This choice therefore allowed us to concentrate on the "software" aspects of network design and setup, leaving out the design of the "physical layer" as well as much of the "applications layer" while still presenting a coherent set of topics. Networking, by nature, is a pragmatically driven topic. It is difficult to truly grasp the fundamental concepts of networking without actively applying them in a realistic setting. The incorporation of a "hands-on" component into a networking course involves somewhat different problems compared to traditional computer science courses. Students in traditional courses typically write programs using well-defined programming languages and operating system environments, or set up applications using software packages (e.g. SQL) that are likewise (relatively) well-documented. To conduct a networking experiment, or set up some particular networking service, a student will have to be knowledgeable of much more than his or her own system. For example, an in-class experiment was conducted using the Linux (Unix) traceroute utility, which lists the routers an IP packet travels through in order to reach a particular host. Traceroute uses the ICMP protocal (IP "traffic control" messages). For security reasons, some routers are configured to block certain ICMP replies to non-local destinations. Such is the case with Trinity College's main router, or gateway, to the Internet. As a result, while outgoing traces correctly lists this router along its route, incoming packets appear to "leapfrog across networks" and "jump out of nowhere". Without knowledge of how the site's router is configured, neither student nor instructor can be expected to explain such phenomena. Since it is often impossible to have complete knowledge of other hosts, many projects involved an experimental element, where the outcome can not be predicted. With care, such experiments can be designed to induce great student enthusiasm towards their assignments. These experimental projects, to be described in further detail in later sections, must necessarily be conducted on the campus network. The cooperation of campus network administration staff was therefore critical. In the case of Trinity college, such cooperation was fortunately available. In addition to providing detailed information on the campus network infrastructure, knowledgeable staff also led the class on a tour of networking facilities, and tolerated certain experiments that may otherwise be considered security risks. Lacking such cooperation, this course would have required a dedicated lab, with its own support staff, array of equipment, and possibly even its own network or subnetwork. Such facilities are often difficult to obtain in small college settings. In short, networking, as the name implies, can not be studied in isolation and therefore requires new techniques that are traditionally absent in standard computer science courses. LINUX The state of networking is in constant flux, and the numerous protocols and configuration alternatives described in textbooks can seem overwhelming unless they are seen in applied form. The Linux operating system, being a "true child of the Internet," is a perfect choice for demonstrating network concepts and allowing students to apply what they've learned. The Unix operating system offers seamless integration with TCP/IP networking. Existing campus Unix facilities, however, can not be used for this purpose since students would require superuser privileges on workstations not dedicated for their use. Linux, on the other hand, can be set up on the students' personal computers. For our course, twenty five students were grouped into teams of two or three each, with at least one member having Linux installed on a system. Students who could not be grouped were assigned two dedicated PC-compatible computers in a small lab. That this was a workable arrangement was partly due to the fact that at Trinity College most students live on campus and form a close-knit community. But it is also due to the immense flexibility of the Linux (and Unix) operating system. With alternatives such as Microsoft Windows NT Server, a student would have to be physically seated at a workstation in order to make changes. With tools such as telnet, rlogin and rsh, however, students can easily access their assigned Linux systems regardless of their location. Of course, it is also well known that Linux is free. Proprietary alternatives such as NT Server would have cost thousands of dollars, especially if students were to obtain their own systems. Furthermore, the "open source" philosophy of Linux means that all updates are immediately available on the world wide web, allowing those involved in the Linux community to constantly keep up to date with the latest developments. A few students seemed initially reluctant to use Linux over their "dazzling multimedia" Windows98 machines. After seeing the power of Linux repeatedly demonstrated, however, these students eventually came to accept it as a superior platform, at least "when it came to networking." One capability that impressed students was connection to a remote X windows server. The concept of using a graphical interface remotely is non- existent with Microsoft Windows. The latest Linux graphical desktops, KDE and Gnome, also helped smooth the transition from other operating systems for our students. Even many system configuration tasks in Linux can now be accomplished with graphical aids, though students were required to manually edit configuration scripts as well. The version of Linux we used was Redhat 6.0, with (initially) the 2.2.5 kernel. Redhat is probably the most popular Linux distribution in the United States, though there is no reason to believe why other distributions would not equally suffice. In the following section we describe some specific tools Linux provides for networking and how they were used in our course. IP WONDERS As soon as (Redhat) Linux is installed, it offers a variety of servers, including world wide web (Apache), FTP, email, DHCP (dynamic IP assignment), and DNS (domain name service) among others. Only configuration files need to modified before these services are activated. Many required configuration files have templates in the /etc directory with various options commented out. The configuration of some of these servers involves merely uncommenting the desired options. For our course, a master Linux host was set up in the instructor's office. This host runs a DNS server that implements a "pseudo-domain" called "csx.cs.trincoll.edu". Do not expect to find this domain on the Internet, for it is not registered. Only hosts that list this server's IP address as its primary DNS server are aware of this domain. Linux systems used in the class were all given "new" names under this domain. Students gladly chose hostnames such as "flaco," "shaguar," and "fletch" instead of what would have been visible to the wider Internet, such as "dynamic-host15694". Setting up a separate domain name server gave us some autonomy from the wider campus network, for it meant that campus network administrators did not need to keep track of our hosts. Such "renegade" name servers, however, can potentially be used for malicious purposes (if you can somehow convince someone to use your host as primary DNS), for it can assign the false IP address to a host name, diverting connections elsewhere. Network administrators need to be aware that it is for an academically legitimate purpose. Our DNS server is protected by a firewall (see "ipchains" below) to prevent access from outside of the Trinity College network. If practical, it would have been better for the college to implement a subnet dedicated for network education. In that case, a name server can be created for a subdomain without conflicting with existing name assignments. However, in our case this was impractical since students lived in different dorms on campus, where each dorm is assigned a different subnet. A major topic under TCP/IP is routing. Linux (and Unix in general) provides a vast array of tools for implementing and experimenting with routing options. The Linux (Unix) route utility implements the somewhat obsolete RIP protocol for determining best paths. However, this has the advantage of not allowing Linux routing "next hop" tables from interfering with protocols, such as Cisco-proprietary EIGP, that are commonly in use. The Linux route utility understands CIDR (classless) supernet numbers and allows a great degree of flexibility in controlling where packets are to be sent. In addition to route, the standard traceroute and ping utilities offer many options for experimentation. With ping, for example, one can specify the sizes of packets to be sent, determining the path MTU (bandwidth) of routes. One successful experiment, carried out in class, concerned the usage of proxy ARP on the college's main router. This mechanism allows a router to masquerade as a destination host, allowing mis-configured hosts to retain connectivity by "tricking" them to send packets to the router. The Linux arp utility can provide a list of current ARP pairings (of IP addresses and ethernet hardware addresses). By using route to configure a host to attempt bypass of the main router, and checking the ARP pairings, it was seen that proxy ARP was indeed in use since the hardware addresses of the destination host and the router were identical in the ARP cache. One of the most useful and powerful of all Linux utilities is the ipchains firewall configuration tool. This utility, offered by Linux 2.2 kernels, replaces the "ipfwadm" tool in other Unix and earlier Linux systems. It is used to write a script against which all IP packets are checked as they enter or exit the system or are being forwarded. Learning to use ipchains is akin to learning a small-scale programming language, for it is essentially a series of nested "if-else" clauses, complete with subroutine calls and return points. For example, the following pair of ipchains commands was used to deny nameserver (udp port 53) responses to hosts outside of the Trinity College network (157.252.0.0/16), and to reject the infamous "finger" (tcp port 79) request from anywhere except the localhost's "loopback" interface: ipchains -A output -p udp -d ! 157.252.0.0/16 source-port 53 -j DENY ipchains -A input -p tcp -s ! 127.0.0.0/8 -i ! lo destination-port 79 -j REJECT ipchains can also be used to intercept a packet going to an intended destination by sending it to a local listening socket, assisting in the implementation of greater security measures. In addition, ipchains implements the immensely useful technique of "ip-masquerading", whereby hosts in a private network can nevertheless communicate on the Internet by masquerading as a central router. Only the router need have true connectivity to the Internet (via an Internet sensitive IP address). At a time when available IP networks are rapidly disappearing, this technique is a major reason for the growing interest in Linux among businesses. It provides a viable alternative to organizations that lack the resources to acquire sufficiently large networks. A companion utility, ipmasqadm, provides "port- forwarding" capability, allowing servers (e.g. web servers) to run on a private network, behind a masquerading firewall. Such an arrangement also has the added advantage of improved security. Ipchains was used in the course to prevent certain experiments (e.g. DHCP service) from interfering with the wider campus network and the Internet. The coverage of ipchains and firewall setup was perhaps the highlight of the course. Other important services available with Linux include NIS (Network Information Service), the popular Apache web server, and both POP and IMAP email services. The Unix Network File System, which allows the mounting of remote directories, is fully supported in Linux. Another means of file sharing is provided by Samba. Called the "secrete weapon" of Linux, Samba emulates the SMB file sharing protocol used by Microsoft Windows operating systems. This allows Windows systems to share Linux files, disk space, and printers. Linux can also be configured as a SMB client, allowing for the seamless integration of the two systems on the same network. Other networking protocols such as appletalk and IPX are also supported by Linux, though they were not explored in our course. Additional utilities include secure shell (SSH) programs for secure remote access (requires download from www.replay.com/redhat/ssh.html). Linux also comes bundled with many development tools (e.g. C, Perl) required to write web applications. All these resources make great opportunities for assigning independent projects. DOCUMENTATION RESOURCES Changes in the Linux Community are a constant fact of life. Redhat, for example, issues a new update of their distribution every few months. With the release of the version 2.2 kernel especially, many references on Linux became instantly obsolete. Linus Thorvalds have announced the availability of the 2.4 kernel by the end of 1999 or early 2000, which will mean the definite continuation of this trend. The most valuable and up to date sources of reference for Linux are on the web, in particular the Linux Documentation Project at www.linuxdoc.org, with numerous mirror sites around the world. Chief among the references available from these sites are the "HOWTOS" - there are HOWTOs on almost every subject in Linux. These informal documents are written by individuals with various idiosyncracies and can range in quality. However, they will always represent the most current documents available. The HOWTOs we found most useful were the networking overview HOWTO [Dawson], and the DNS [Langfeldt], Ipchains [Russell], and ip-masquerade [Ranch-Au] HOWTOs. In addition to the HOWTOs, many online "man pages" offer exceptionally valuable information. Another source of reference are the "RFC" standardization documents maintained by the Internet Engineering Task Force. These formal documents describe the various protocols in full detail, and can be consulted to fill in critical gaps left by other sources, as in the case of IPv6 [D&H]. Needless to say, students must become acquainted with such non traditional sources of reference. Published titles on Linux are plentiful and growing in number. However, most such references are only useful for installation and initial setup. The HOWTOs are the sources for truly detailed information. Nevertheless, having a printed reference on hand could be valuable for the novice. The books we found useful for this purpose were "Redhat Linux 6 Unleashed" [Pitts-Ball] and "Using Linux" [Ball]. Several such titles include distributions of Linux on CD ROM, making them bargains for those without fast network connections. In addition to these references, an official text [P&D] was also adopted. This text presented material in a manner that was relatively consistent with the intended schedule and focus of the course. Several other standard network texts (e.g. [Tanenbaum]) were used as secondary reference. Furthermore, a practically oriented guide on TCP/IP [Feit] proved to be a valuable reference to the instructor. This volume filled in many gaps left by the text and other references. However, this and similar "nuts and bolts" titles should not be used entirely in place of traditional academic texts, as the manner in which they tend to approach subjects are often not designed to convey fundamental concepts. COURSE CONTENT AND ASSIGNMENTS The first assignment of the course was for students to install Linux using a network (FTP) connection. This assignment set the tone for the type of activities for the rest of the semester. Linux installation has improved dramatically over the past few years, but may still require a degree of patience and ability to withstand frustration. Chief among installation problems are the lack of compatibility with certain hardware devices, especially network cards, printers, and display adapters on laptops. The philosophy of open source demands that all software, including device drivers, publish their source codes. Several companies still have not released the necessary information for device drivers to be written, altough their numbers are fastly shrinking. It is necessary that hardware compatibility lists be consulted (such as at www.redhat.com/hardware) before installation. Linux generally prefer the most established brands, such as 3Com network cards and Epson printers. Additionally, some system BIOS options may interfere with Linux. Certain "HP Pavilion" systems, for example, must have their "plug and play OS" option turned off before Linux can be installed. Installation of the full distribution of Redhat Linux requires more than a gigabyte of disk space. Installing the full distribution on a single "/" partition may be the safest choice for the novice, since it may be difficult to pick out all the packets required. However, for those with scant hardware resources, installation (and uninstallation) of individual packages is easily accomplished with the Redhat Package Management (RPM) system. The course began with a brief coverage of the Ethernet "CSMA/CD" protocol. This was the closest that the course came to the physical implementation of networks. A discussion of some "data-link layer" issues, however, was necessary to introduce concepts such as network capacity, bandwidth, and hardware interface addresses which are required in presenting the more abstract protocols. A major unit of the course was on IP routing. Various routing protocols and problems of routing on the Internet were discussed. Foundational algorithms, such as Dijkstra's shortest path algorithm, were also covered. The Linux route utility was explained in detail. Students were then given an assignment that required them to "map" the campus network. Trinity College's Cisco network divided the campus in to "Virtual LANS" consisting of several subnets each. Every subnet is connected to a central campus router. However, communication between hosts on the same (virtual) LAN is possible without using the intermediate router. The assignment was to discover, using traceroute, route and ping (some of these utilities have been ported to MS Windows operating systems as well), which subnets are implemented on the same Virtual LAN. This assignment also required students to calculate tight "network masks" that can group together several subnets. The group of students that mapped out the most subnets was given extra credit. Many students took to this assignment with gusto and mapped out several LANS while invading labs and dorm rooms in the process. In retrospect, a warning message should have been sent to the college community informing them of our students' legitimate activity. Contents of the IP packet header and their functionality were addressed in detail. Various IP-related issues and protocols were covered, including fragmentation, ICMP, ARP, and DHCP. IPv6 (next generation IP) was introduced as well. Although Linux contains a preliminary implementation of IPv6, documentation on its use remain scarce. We were unable to give a hands-on assignment concerning this new protocol. One of the more difficult assignments of the course required students to set up their own DNS servers. Though providing a simple service, DNS illustrates a major concept in inter-networking: how to implement a hierarchy of servers through a distributed database, and how to retain functionality when individual sites do not respect the hierarchy. There is both a centralized view of how domains and subdomains are organized, and a distributed view on how search requests are propagated through servers. DNS is an involved protocol. Unfortunately, the DNS software that comes with Linux is also not the easiest to use, having many esoteric features and can cause much confusion to students. Many students were confused, for example, by the relationship between domains and subdomains and how it differs from the relationship between primary and secondary servers for the same domain. Ample warning should be given as to the subtlety of the DNS. Despite it difficulties however, DNS setup is a valuable exercise. It requires a theoretical understanding of the concepts involved, yet, without setting up an actual DNS server, it is almost impossible to understand the implications of these concepts from reading the textbook alone. Firewall setup using ipchains was another major assignment. Students were required to define rules meeting a set of specifications (such as denying incoming ftp requests without affecting outgoing connections), and to automatically record them in system logs (Ipchains offers this option). We then ran individual tests against their firewalls and they were required to describe the timing and nature of these tests by examining their logs. The remainder of the semester was devoted to the TCP protocol and to socket programming. UDP datagram programming was addressed briefly. The programming language we chose was Java, primarily because this language is used in our introductory programming courses and is one which our students are already familiar with. To teach C or Perl given the other goals of the course was judged to be unrealistic. Socket programming in Java is also considerably easier compared to these other languages. Ports of the Java developer's kits for Linux can be found at www.blackdown.org. One assignment had the students implement an encryption algorithm and an encrypted file transfer program. It should perhaps be emphasized that, aside from the type of assignments described above, much of the course was in fact taught in a traditional manner, with standard lectures on network algorithms such as "bit stuffing" and "sliding window". Problems from the text were assigned and several quizzes were given to evaluate the students' understanding of the algorithms. Alongside the protocol-specific topics, there was discussion on more foundational subjects such as the scalability of designs (a major theme of the text [P&D]). It was our goal to give students a solid foundation in networking principles, as well as to initiate them in the practice of networking. CONCLUSION We have successfully experimented with a method of teaching a relatively advanced networking course in a small college environment. Our aim was to teach in one course as much as possible without having it degenerate into a survey course. The TCP/IP suite of protocols and "way of thinking" about networks suited this purpose. The concepts we taught were illustrated using the array of Linux utilities for TCP/IP networking. By maintaining their own Linux systems, students saw exactly how all the protocols worked together to form a coherent networking model. Networking, more than other subjects in computer science, involves elements of the unknown, and consequently students need to appreciate the experimental nature of many assignments. In this they did. Of course, certain students will always have a lethargic attitude toward any subject, and effort was require to keep them involved in the course. The initial enthusiasm of a few students dampened when they quickly realized how much work was required. Nevertheless, enthusiasm and interest in this course was kept at a high level by a majority of the class. Several students consistently did much more than what was required of them; some even implemented their own Linux-based networks. Although teaching this course was a taxing effort, we were able to demystify TCP/IP networking for many of our students. We also helped to bring them into the Linux community, perhaps the most forward- thinking community in the age of the Internet. REFERENCES [Ball] B. Ball. Using Linux. Que Corporation, A Division of Macmillan Computer Publishing, 1998 [Dawson] T. Dawson et al. Networking-HOWTO. The Linux Documentation Project, http://www.linuxdoc.org/docs.html, 1999. [D&H] S. Deering and R. Hinden. RFC 2460: Internet Protocol, Version 6 (IPv6) Specification. The Internet Engineering Task Force, http://www.ietf.org/rfc.html, 1998 [Feit] S. Feit. TCP/IP. McGraw Hill, 1999. [Langfeldt] N. Langfeldt. DNS-HOWTO. The Linux Documentation Project, http://www.linuxdoc.org/docs.html, 1999. [P&D] L. Peterson and B. Davie. Computer Networks: A Systems Approach, Morgan Kaufmann, 1996. (Second Edition now available) [Pitts-Ball] D. Pitts and B. Ball et al. Red Hat Linux 6 Unleashed, SAMS, 1999. [Ranch-Au] D. Ranch and A. Au. IP-Masquerade-HOWTO. The Linux Documentation Project, http://www.linuxdoc.org/docs.html, 1999. [Russell] P. Russell. IPCHAINS-HOWTO, The Linux Documentation Project, http://www.linuxdoc.org/docs.html, 1999. [Tanenbaum] A. S. Tanenbaum. Computer Networks, Third Edition, Prentice Hall, 1996