CSC 155 Assignment: DNA Sequence Alignment with Dynamic Programming Part I: Planning the Implementation Due Thursday 4/28 (absolute). We are going to implement the algorithm for global sequence alignment in two stages. First you are going to form groups (this is required) and formulate a detailed outline of how to organize your implementation. We will then hold a "meeting" to discuss what you've come up with, then write the programs. 0. Be sure to read the tutorials I gave you (also on web page). Read both the "simple" and the "advanced" tutorials - they actually differ very little. You will need to organize your implementation around a central class. There may be other classes you will need to define (such as a class representing a pair of nucleic acids). 1. Determine the data structures you will need. In particular, answer the following: a. how to represent the two DNA sequences. That is, are you going to use chars or ints, arrays or vectors, etc... b. How to represent the matrix. This is a no-brainer. You'll obviously need a 2-D array. But also think about if you're going to use static or dynamic arrays. c. How to represent the final alignment. Your program needs to produce output that looks like: GGCTTATG-CTCGAGCCCGGG || || ||| | | | -GCACTTGCCTCTCG-GC-GC Note that this sequence is longer than either original sequence, since there could be gaps in either final sequence. How would you represent this information? What data structure would you use to represent the final alignment. 2. How are you going to represent the Scoring function and the gap penalties (also see part 5 below). Please note that the gap penalty is not necessarily always a constant, like in the two examples. In the general case, the penalty can also depend on the length of the gap. In the two examples in the tutorial, the gap length is always 1. 3. As the tutorial indicates, the algorithm has three basic stages: Initialization, Matrix fill, and Traceback. We should think of implementing each stage as a seperate function. Sketch out what Each function should do. Be clear as to what parameters and return value the methods will have (if any). Outline the algorithm each function will execute (write it out in pseudo code). Think especially about how your algorithm is going to use the data structures you've decided on above. In addition to Traceback, you should have a procedure to print out the final alignment (like the one above). Sketch this function as well. ***** I will grade this assignment qualitatively. That is, it doesn't ***** have to be completely correct, but it needs to be carefully done ***** and well-thought out. Please spend some time on this. Sloppy ***** last-minute put-togethers are spotted easily. 4. Write out (as actual code if possible) The **header** (prototype) for your class. 5. Consider what changes will need to be made to your program if the scoring scheme changes (what if we adopted the advanced scoring scheme?). You'll notice that the "advance" scoring scheme uses mostly the same techniques as the "simple" one. How would you incorporate the new scoring scheme into your program while making minimal changes to your code. In fact, you should think about how to make your implementation general enough to handle any variation in the scoring scheme.