/* MPI Assignment 1 */ You are to do this assignment in pairs. 0. Write a program that compares the relative overhead of various forms of communication between two processors (-np 2). Declare and generate random 2D arrays of doubles, just as in the sample matrix multiplication program. Each process is to have double A[N][N]; // send buffer double B[N][N]; // recieve buffer The rank 0 process needs to send A to the rank 1 process and receive the contents of B from the rank 1 process. Correspondingly, the rank 1 process needs to receive its B from rank 0 and send its A to rank 0. The MPI_Wtime() call gets the current time in seconds and can be used to measure the elapsed time of a program: double t1, t2; t1 = MPI_Wtime(); ... t2 = MPI_Wtime() - t1; // records elapsed time. Measure the total time it takes to complete the exchange for the following cases: First, use only synchronous send/receive 1. Sending the entire 2D array with a single send operation. 2. Sending the array row by row using a loop. 3. Sending each element (double) of the array separately (for this one, you should start with very small values of N, like 10). 4. If you know some networking, you probably know that the maximum ethernet packet size is around 1500 bytes. Larger packets will have to be fragmented, either by you or by the system. Write yet another version of the send/receives that only sends out 150 doubles (1200 bytes) at a time. Hint: use (double*)A and (double*)B and pointer arithmetics. For these experiments, I recommend that you put them all into one program instead of in seperate programs. Do not printf anything until the very end (you should know why). Also, you'll need to vary the size of the matrix. I used N=2000 (4 million doubles = 32 megabytes) to get good measurements (except for version 3!). Record and report as to the times. 5. With synchronous send/receive, you must make sure that the rank 0 processor sends first and the rank 1 processor recieves first, otherwise you may get deadlock. Asynchronous communication can be used to avoid such situations. Roughly, you can write both processes as follows start asynch send start asynch receive wait for sends to complete wait for recieves to complete. Please modify part 1 of your program to use this technique. =================================================================== Part II: /* A prime number is any number > 1 that's divisible only by 1 and itself. There are many sophisticated ways to test for primes. A simple one is to test if the number is divisible by 2 or any odd number between 3 and n/2. Here is a conventional, 1-process implementation: */ int isprime(unsigned int n) { unsigned int i; if (n==2) return 1; if ((n<2) || (n%2 == 0)) return 0; for(i=3; i<=n/2 && ((n%i)>0); i+=2); return (i>n/2); } /* Is this algorithm a good candidate for parallel speedup? You should show that it is by deriving the complexity of computation versus the complexity of communication. Then you should implement a multi-processor solution to demonstrate your hypothesis. The rank-0 process will either input or randomly generate the number to be tested. Each process will then be responsible for checking if the number is divisible by a range of values, and send the result back to the rank-0 process. For the final communication, when the results are gathered from each process, use MPI_REDUCE with operation MPI_LAND (logical and), instead of the normal send/receive operations. MPI_REDUCE is similar to MPI_Bcast: all processes must call it. Read the description of this function in the textbook and online docs carefully. The C datatype unsigned int corresponds to the MPI datatype MPI_UNSIGNED. Here are some large-enough primes for you to test: 2147483647 2147483629 2147483587 Also test your program on non-primes, and compare the answer with that returned by the conventional function above. Compare the elapsed time of the conventional function and your multi-processor implementation. You should see speed up for the large numbers above. ------------- Additional hints: To take a command line argument and convert it to a numerical value: #include ... int main(int argc, char **argv) { unsigned int n; n =(unsigned int) atoi(*++argv); ... For you java-ish folks, *++argv is basically the same as argv[1], with an auto-increment of argv first. Note that argv[1], not argv[0] is the first command-line argument. */