CSC 259 Hadoop-Map/Reduce Assignment 2 DNA sequences are represented using the symbols A, C, G and T. As part of a federally funded project, a group of geneticists collected numerous DNA samples from lions, tigers, leopards, and other animals of the feline family. After years of painstaking analysis, they have discovered, to their shock, a DNA strand that seems to be common to all feline species, big or small. The Feline DNA: CATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCAT Whether or not you believe in the scientific accuracy of this finding, you are asked to write a program to help search for the presence of the feline DNA. Since mutations may occur, we are also interested in matches that include a few errors. Although one can efficiently represent DNA sequences using base-4 numbers, we will simply use Strings. I am giving you the following basic function to search for matches: // find best possible substring of line matching pattern static int[] bestmatch(String pattern, String line) { int errs = 0; // var to keep track of number of errors int plen = pattern.length(); // also value of max number of errors int llen = line.length(); int beste = plen+1; // best number of errors found so far int besti = -1; // position of best match so far int i, j; for(i=0;i<=llen-plen;i++) // determine match starting at i { errs = 0; for(j=0;j is the key and the rest is the string representation of the value. position 29, 0 errors] If there were two copies of the same file in the directory, it would have produced position 29, 0 errors] position 29, 0 errors] Note that you don't need to worry about the pattern spread across different files, just different lines in the same file. The file position takes into account the "\n" at the end of each line, which is not read into the Text/String passed to map. If you wish, you can adjust the position value to discount these, so the position value can be slightly different from what I expect. Please submit both your source and (in comments) the output of the program on the data produced by gendna. You can use either separate files or use static inner classes to fit everything into one file, like the example in the Hadoop tutorial at hadoop.apache.org. -------------- Additional hint: Hadoop is still evolving and has some teething problems. I encountered the following nasty problem. I had the following kind of loop in my reduce method: reduce(... Iterable Values ...) {... ValType best = v; for(ValType v : Values) { if (v.betterthan(best)) { best = v; } } ... This loop didn't work as expected because apparently the memory locations that each v points to are being manipulated in the background by the Hadoop implementation code. That is, if you point best to a certain memory location inside the loop, don't expect it to have necessarily the same value after the loop. Very annoying indeed. To work around it you should clone the object instead: best = v.clone(), where .clone() returns a new object with the same content as v.