# Sample Grammar for matching brackets (), [] and {} # the "absyntype" is the rust type of the value returned by the parse function. # in this example, the absyntype will represent the number of pairs of matching # brackets of types (), [] and {} absyntype (u32,u32,u32) # the absyntype can be any type that implements the Default trait. # (u32,u32,u32) defaults to (0,0,0) # This grammar contains 3 non-terminal symbols, although the parser generator # may create additional symbols. nonterminals E S WS # Nonterminals and terminals must be declared. terminals ( ) [ ] LBRACE RBRACE Whitespace # topsym is the top or start symbol of the grammar topsym S # on error, the parser will skip past the next whitespace and keep parsing resync Whitespace # The following are the grammar production rules. E --> ( S:(a,b,c) ) {(a+1,b,c)} E --> [ S:(a,b,c) ] {(a,b+1,c)} E --> LBRACE S:(a,b,c) RBRACE {(a,b,c+1)} S --> WS { (0,0,0) } S --> S:(a,b,c) E:(p,q,r) {(a+p,b+q,c+r)} WS --> WS --> WS Whitespace !use rustlr::charscanner; !//anything on a ! line is injected verbatim into the generated parser ! !fn main() { ! let argv:Vec = std::env::args().collect(); // command-line args ! let input = &argv[1]; ! let mut parser1 = make_parser(); ! let mut lexer1 = charscanner::new(input,true); ! lexer1.modify = |c| { match c { //modify symbols to grammar terminal names ! "{" => "LBRACE", ! "}" => "RBRACE", ! _ if c.chars().next().unwrap().is_whitespace() => "Whitespace", ! _ => c ! }}; ! ! let result = parser1.parse(&mut lexer1); ! if !parser1.error_occurred() { ! println!("parsed successfully with result {:?}",&result); ! } ! else {println!("parsing failed; partial result is {:?}",&result);} !}//main EOF Everything after "EOF" is ignored and can be used for more comments: Each grammar rule has a an associated "semantic action" enclosed in {}'s. The semantic action is a piece of rust code that must return a value of the declared abysyntype. If no action is specified, one will be created that returns absyntype::default() - the chosen absyntype must implement the Default trait. Although not used in this grammar, multiple productions for the same left-hand side nonterminal may be specified on one line using the | character. It is also possible to specify a rule on multiple lines using ==> instead of --> and ending with <== Note that {, } and | cannot also be used as terminal symbols. The grammar must specify other names such as LBRACE and RBRACE and the lexical scanner must translate "{" and "}" into these tokens. Rustlr defines a 'Tokenizer' trait, and allows the use of any lexical analyzer that implements this trait. An included example of a scanner is charscanner. This lexer takes a string and returns each character as a separate token, with the option to skip whitespace characters. It also has a `modify` function that can itself be modified so that some characters are transformed into other tokens. charlexer is adequate for simple examples. In this example, we've used the ! lines to inject enough code into the generated parser so it's a self-contained program. Normally, however, only a few ! lines are needed for 'use' type declarations. In the main() injected into this parser, we use charlexer with the option of keeping whitespace characters because they're used in the grammar. We also change the modify function to return tokens that match the names of the terminal symbols in the grammar. Each grammar symbol, terminal as well as nonterminal will always have associated with it a value of absyntype. When non are specified, absyntype::default() is automatically used. We can match the value with a simple pattern (without whitespaces) such as (a,b,c). Please note even if there is one value, parentheses are required: (a). The semantic action has access to these values when forming the return value. The semantic action is otherwise injected verbatim into the generated parser: rustlr is not responsible if Rust cannot compile your semantic actions. To generate a parser for this grammar with the rustlr executable (cargo install rustlr): > rustlr brackets.grammar lalr Or, from inside a program you can call the function `rustlr::rustler("brackets","lalr");` This produces a file bracketsparser.rs. Since this file contains a main, you can cargo build a new crate and replace main.rs with the contents of the parser file. Be sure to insert rustlr = "0.2.0" under [dependencies] in the crate's Cargo.toml.