Basic instructions for setting up MPICH/2 Beowulf Cluster A Beowulf cluster configures a number of processors, typically 64 or more, into a supercomputer. All computers in the cluster must be similar enough to share the same executables and libraries. Usually, a designated "server" holds the shared files and directories. Beowulf processors typically are hidden behind NAT firewalls because they need to communicate with eachother with the least amount of overhead and restriction. Typically, high-performance special purpose networks such as the Myrinet is used to build clusters. We'll have to do with TCP/IP over Ethernet, which is way to slow but will be good enough for most of our educational aims. Hofstra also has a cluster of 96 processors on a Myrinet, which was funded by an NSF grant and used for scientific research. We are going to configure a number of processors in the Adams 019 network into a beowulf cluster. Four of the machines have already been configured: deepdish matrix ubuntu warhead All files are served from deepdish via nfs. Both MPICH1 and MPICH2 are installed. Server side notes: create user beowulf and directory /home/beowulf Install MPICH2 from www-unix.mcs.anl.gov/mpi/mpich inside user directory Export directory: create /etc/exports with contents such as /home/beowulf 10.1.0.0/16(rw) service nfs start exportfs -a ================= Individual Processor Configuration ================== ========================================================================= In order to create a new node on the cluster, you must have root access to a machine. 0. anonymous ftp to starbase (10.1.0.98) (log in as anonymous) cd to pub/fc4/Fedora/RPMS mget rsh-server*.rpm mget xinet*.rpm install these rpms: rpm -Uvh *.rpm 1. edit /etc/xinetd.d/rsh and change disabled "= yes" to "= no" MPICH will use rsh, which is the unencrypted version of ssh to remotely start processes on other machines. Start the xinetd and rsh servers: service xinetd start in.rshd start Also insert these commands in /etc/rc.d/rc.local so that they're automatically called upon reboot. 2. edit the file /etc/hosts in this file, make sure that the domain name of your host does NOT resolve to 127.0.0.1. Here's an example of how it SHOULD look: 127.0.0.1 localhost.localdomain localhost 10.1.0.2 deepdish.secret.hofstra.edu deepdish Now "deepdish" will not be treated in the same way as localhost. This problem may also cause the Java rmiregistry program to behave erratically. 3. Create a new user (via system-config-users) with username beowulf and home directory /home/beowulf. The user ID MUST BE SET TO 501 (which is automatic if there's only a guest account on your machine). Check "create home directory", but don't put anything inside. 3. Edit the file /etc/fstab and add the line 10.1.0.2:/home/beowulf /home/beowulf nfs auto,rw 0 0 This will allow you to mount the /home/beowful directory from deepdish Now execute mount /home/beowulf You should now be able to cd into this directory and view the files there. Take a look at the .bashrc file, which contains further configurations (loaded automatically upon login) 4. copy the files /home/beowulf/hosts.equiv and hosts.allow to /etc /etc/hosts.equiv - list ip's of all hosts in cluster /etc/hosts.allow - list ip's of all hosts in cluster These settings allow hosts on the cluster to communicate via PASSWORDLESS RSH from the beowulf account. This is why you'd only want to run a cluster behind a firewall. 5. Now you can log in as beowulf. I suggest the following procedure: log on as guest or root: ssh -Y -l beowulf localhost trust me. 6. MAKE A DIRECTORY FOR YOURSELF. YOU'RE ONLY ALLOWED TO CHANGE FILES IN YOUR OWN DIRECTORY. DO NOT CHANGE ANY FILES OUTSIDE OF YOUR OWN DIRECTORY EVEN IF YOU APPEAR TO HAVE THE ABILITY TO DO SO. The cluster depends on a trusted-network concept - every user of the cluster need to trust eachother. You will be held responsible for all your actions even if they're not intensional. ONLY CHANGE FILES IN YOUR OWN DIRECTORY! 7. Create in your own directory a file called localgroup with the following entries: 10.1.0.x 10.1.0.x 10.1.0.x 10.1.0.x 10.1.0.x 10.1.0.x 10.1.0.x 10.1.0.x 10.1.0.x Where x matches your hosts real ip address. When testing your mpi program, you should first only use your own machine. If a mpi program runs to completion then all is well, but if it crashes, you could end up stranding a number of processes (walking zombies) on other machines of the cluster, which can lead to unpredicatible effects. You can simulate multiple processors by having all processes run on your own machine. 8. copy the file /home/beowulf/test0.c to your own directory and follow the instructions for compilation and execution in the file, in comments. ----------------