open System; // F# program to recognize non-context free pattern: implements a FSM ////////// The Story: // DNA sequences are represented using the symbols A, C, G and T. As part // of a multi-million dollar research project, a group of geneticists collected // numerous DNA samples from tigers, lions, panthers, and other members // of the feline family. They discovered, to their shock and horror, that // all DNA samples contained a certain pattern: a number of C's, followed // by the same number of A's, followed by the same number of T's. For // example, small cats had sequences like CCCCAAAATTTT, while large cats // (lions) had CCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTT. // To confirm their incredible finding, they need a computer program to // automatically detect the presence of this pattern. It so happens that // this pattern is the canonical example of the **context free pumping // lemma**. This theoretical result implies that the pattern C^nA^nT^n // cannot be described by a *context free grammar*. This means that a // program that can recognize the pattern must have read/write access to // at least two counters, and that if not written carefully, the program // could go into an infinite loop. Non-context free languages requires a // proper Turing machine, and is therefore subject to the undecidability of // the Halting problem. // Thinking of you as a well-educated computer science student, they have // turned to you for help. Whether you think their hypothesis is ludicrous // is beside the point. You have to help them by implementing a finite state // machine (a finite automaton with memory and actions). // This version of the program is purely functional. Aside from I/O // there is no mutation of variables. Both the fsm and the run_fsm functions // are tail-recursive. There are five states. State SC takes a counter // indicating the number of 'C' chars seen. State SA takes a pair of // counters (n,m) where n is the number of preceeding 'C' chars and m // is the number of 'A' chars seen so far. The ST take inherits the // m counter from the SA state and counts it downwards. The Accept // state is reached when the ST counter reaches 1, meaning that we've // seen the same number of C's, A's and T's. The Accept state records the // end position of the pattern. type state = | SC of int | SA of int*int | ST of int | Start | Accept of int // fsm function maps current state to next state printf "Enter Sequence: " let input = Console.ReadLine(); let rec fsm current_state index = match (current_state, input[index]) with | (Start, 'C') -> SC(1) | (Start,_) -> Start | (SC n, 'C') -> SC(n+1) | (SC n, 'A') -> SA(n,1) | (SA (n,m), 'A') when m SA (n,m+1) | (SA (n,_), 'C') -> SC 1 | (SA (n,1), 'T') -> Accept index | (SA (n,m), 'T') when m<=n -> ST(m-1) | (ST n, 'T') when n>1 -> ST(n-1) | (ST 1, 'T') -> Accept index // record where match was found | (ST n, 'C') -> SC(1) | (Accept _, _) -> current_state // don't move out of Accept | _ -> Start let rec run_fsm current_state index = if index=input.Length then current_state else run_fsm (fsm current_state index) (index+1) printfn "Final State: %A" (run_fsm Start 0) // See a version that uses non-functional features here: // https://cs.hofstra.edu/~cscccl/csc123/mutcat.fsx