RUSTLR: Bottom-Up Parser Generator for Rust, Version 0.3.x
The ultimate goal of Rustlr is to round up all the hoofed herbivores into a
tool that's both usable and enjoys the expressive power of LR grammars and
beyond. It's been decades since Donald Knuth proved that every
deterministic context free language has an LR(1) grammar. However, such
theoretical properties never settled the dispute between LR-style parer
generators and those based on other techniques. One reason is that users of
LR parsing have always faced a steep learning curve. How an LR parser works
is not as intuitive as, for example, a recursive descent parser. To alleviate
this problem Rustlr implements a collection of features including:
- The option of automatically generating the abstract syntax datatypes
and required semantic actions from hints given in the grammar,
in whole or in part. The AST types do not necessarily mirror the format
of the grammar: one can declare dependencies that allow multiple non-terminal
symbols to share a common type (see Chapters 4 and 6 of the tutorial).
- The option of generating recursive AST types using the "bumpalo" crate,
which enables deep pattern matching on recursive structures by replacing
smart pointers with transparent references.
- The possibility of using regular-expression style operators such as
*, + and ? directly inside the grammar. This is not as easy to do as it
sounds as the additional grammar rules needed to
support them may lead to new ambiguities.
- The option of using a larger class of unambiguous grammars
compared to traditional LR and LALR, based on Marcus-Leermakers delayed reductions. See the Appendix.
Other experimental features include an "unexpected wildcard" symbol.
These mechanisms help to alleviate the natural problems of combining
context free grammars with regular expressions, without shoving core
issues under the table by resorting to ad-hoc defaults.
- Operator precedence and associativity declarations allow the use
of more naturally readable grammars.
- The ability to interactively train the parser to give better error messages.
- The generated parsers also have optional access to
"external state" information that would allow them to parse some non
context-free languages.
Rustlr defines a trait that allows the use of any lexical analyzer as long as it's
adopted to the trait. However, rustlr also provides a general
purpose, zero-copy lexer that suffices for many examples. A lexer specific
to a grammar can be automatically generated from a
minimal set of declarations.
With future releases, Rustlr will also be able to generate parsers for
languages other than Rust. With version 0.3.7, it
can generate a parser for F#, although not all capabilities are yet
available. F# is the .Net version of Ocaml and lacks options when it comes
to parser generation.
Rustlr will mainly target typed, functional languages that support
algebraic types and pattern matching.
The documentation format on
docs.rs
is a good technical reference but does not serve as a tutorial.
This tutorial is evolving as Rustlr is being enhanced with new features.
The project aims to be backwards compatible as much as possible.
Tutorial by Examples
The tutorial is organized around a set of examples, starting
with the simplest, with each example explaining a set of more advanced
features. All features of rustlr will
eventually be explained as you progress through the examples. It would be
helpful if the reader is familiar with some basic bottom-up
parsing concepts, such as those covered in typical compiler texts.
The chapters in bold listed below are complete. The others provide additional examples and generally contain enough comments to be readable.
The latest and most advanced features of Rustlr are described in Chapter 4 and
in the Appendix.
- Chapter 1: Classic calculator, unambiguous LALR.
- Chapter 2: Enhanced calculator with more advanced features, including interactive training for error reporting.
- Chapter 3: Semantic actions returning multiple value types. (older version)
- Chapter 4: Automatically Generating the AST
- Chapter 5: Using Regex Operators *, + and ? in Grammars
- Chapter 6: Generating Bump-allocated ASTs that enable recursive pattern matching
- Chapter 7: Error Recovery Options
- Advanced Example: Building a Parser for C. (under construction). link to crate
- Special Example: Non-context free language, using External State. Link to full crate
- Special Chapter: Generating a Parser for F#
- Appendix: Experimental Features (Delayed Reductions and the Wildcard Token)
- Additional Full Example: Yacc converter. Create with rustlr grammar that builds a parser for converting Yacc grammars to Rustlr format, stripping away all C declarations and actions.
- Additional Full Example: Lambdascript. Program implementing and tracing beta-reduction
steps for the untyped lambda calculus. This crate was created using rustlr
and this grammar.
- (Deprecated versions of chapter 1
and chapter 2)
Additional Grammars (with no semantic actions)
- LR(1) but non-LALR(1) grammar
- LALR Grammar for full Java version 1.4
- ANSI C grammar (adopted from yacc syntax)
References