Rustlr requires that the value returned by the smantics actions of all grammar rules be of
the same type, declared using the valuetype
or absyntype
directive. It only allows one other type, the externtype
which
means that these actions can be stateful. However, sometimes it may
still be more convenient to allow different rules to return values of
different types. Theoretically, this can be accomplished by generating
a enum-type internally that would encompass all possible types
returned by the rules. In other words, instead of leaving the
definition of an enum with a large number of variants to the user, it
is created internally. The problem with this approach is
that it is not quite compatible with the goal of decoupling the
parser from the lexical analyzer. The tokens returned by a lexical
scanner must also carry values, such as numerical constants and string
literals, and these values must also be of the same "absyntype" as any
that appear on the parse stack. Integrating such values into a
generated enum will tie the rustlr runtime parsers to one specific type of
lexical token. Rustlr does include a
RawToken type, but this type was not created to cover all possible
scenarios when a parser might be needed. It only exists for the
purpose of including a usable tokenizer, StrTokenizer, with the
rustlr crate. We wish to allow rustlr to parse any type of input,
including binary formated input, as long as tokenizers can be provided
for them and adopted to the Tokenizer trait.
The currently available approach to allowing different grammar rules to return differently typed values borrows a page from object-oriented programming and relies on the Any trait, specifically LBox<dyn Any>. When a terminal or nonterminal symbol of the grammar is declared, one can optionally specify a type associated with it that's distinct from the overall absyntype (but which also must impl Default). When the absyntype of the grammar is declared to be LBox<dyn Any>, rustlr will automatically generate code to downcast the values attached to grammar symbols and upcast the values returned by the semantic actions to the supertype. The following is another grammar for a calculator program, but which demonstrates this option.
!use rustlr::{unbox};
!pub enum Expr
!{
! Val(i64),
! Plus(LBox<Expr>,LBox<Expr>),
! Times(LBox<Expr>,LBox<Expr>),
! Divide(LBox<Expr>,LBox<Expr>),
! Minus(LBox<Expr>,LBox<Expr>),
! Negative(LBox<Expr>),
! Nothing,
!}
!impl Default for Expr { fn default()->Self {Nothing} }
absyntype LBox<dyn Any>
nonterminal E Expr
nonterminal ES Vec<LBox<Expr>>
terminal + - * / ( ) ;
typedterminal int Expr
topsym ES
resync ;
left * 500
left / 500
left + 400
left - 400
lexvalue int Num(n) Val(n)
E --> int:m { unbox!(m) }
E --> E:e1 + E:e2 { Plus(e1,e2) }
E --> E:e1 - E:e2 { Minus(e1,e2) }
E --> E:e1 / E:e2 { Divide(e1,e2) }
E --> E:e1 * E:e2 { Times(e1,e2) }
E --> - E:e { Negative(e) }
E --> ( E:e ) { *e.exp }
ES --> E:n ; { vec![n] }
ES ==> ES:v E:e ; {
v.push(e);
unbox!(v)
} <==
EOF
Both the nonterminal
and the typedterminal
directives allow a type
to be associated with grammar symbol. The default type is
the same as the absyntype (LBox<dyn Any>) unless so defined.
In this grammar, the type associated with the non-terminal E is
Expr, but that for ES is Vec<LBox<Expr>>. This is an alternative
to including a Seq(Vec<LBox<Expr>>)
variant of Expr, which was
used in the chapter 2 example.
In the LBox<dyn Any> special setting, the labels attached to grammar symbols on the right-hand side of productions no longer represent value of type StackedItem, but are of type LBox<Ty>, where Ty is the type associated with that grammar symbol. That is, in writing the semantic action for a rule such as
E --> E:e1 + E:e2 { Plus(e1,e2) }
the labels e1 and e2 are automatically downcast to LBox<Expr>. The semantic action should return a value of type Expr because that is the type associated with the nonterminal E. The returned value will be placed in a LBox and upcast to LBox<dyn Any> by rustlr before being pushed onto the parse stack. Pattern labels can still be used to describe the values. A pattern inside @..@ will match the down-casted value inside the LBox. As a much large example, this grammar, which defines a scaled-down version of Java, demonstrates how the LBox<dyn Any> type can still be used alongside patterns.
Macros including unbox! allow the semantic values to be extracted from the LBox.
The final value returned by the parser will also be of type
LBox<dyn Any>
and must be downcast to a usable value using the provided
LBox::downcast function.
The major downside of using this object-oriented approach is that the
abtract syntax types cannot contain non-static references, because the Any type
does not cover such references. Even
though rustlr allows lexical scanners to be zero-copy, the lifetime
directive is meaningless in this mode. Basically, this means that
instead of &'t str
one may have to use owned strings in the abstract
syntax representation. Thus there is a non-zero runtime overhead to
this approach. In addition to being slower, using the Any trait with
downcasting also sacrafices a degree of static type safety. Thus, the
tradeoff here reflects the fundamental tradeoff between Rust and
object-oriented languages in general. Those choosing the LBox<dyn Any>
option for its convenience should understand and accept this tradeoff.