Textbook Example
In many tutorials that talk about compiler design, the example used is parsing a math expressions consist of variables (identifiers), +, *, and parenthese (( and )). For example, the following are valid expressions:
a + b * c
(a + b) * c
(a + b) * (c + d)
The grammar can be described verbally as follows:
- An expression (
E) is a list of terms (T) separated by+. - A term (
T) is a list of factors (F) separated by*. - A factor (
F) is either an identifier or an expression enclosed in parentheses.
You may also seen the following used to describe the above.
(It’s called Backus-Naur Form or BNF. It’s ok if you don’t understand this for now.
Some places might have ::= instead of =>.)
E => T E'
E' => + T E' | ε
T => F T'
T' => * F T' | ε
F => ( E ) | id
This can almost be translated directly into Rust code with teleparse.
Some helper structs are needed to avoid loops when calculating trait requirements,
which are annotated with comments.
use teleparse::prelude::*; #[derive_lexicon] #[teleparse(ignore(r#"\s+"#))] pub enum TokenType { #[teleparse(regex(r#"\w+"#), terminal(Ident))] Ident, #[teleparse(terminal(OpAdd = "+", OpMul = "*",))] Op, #[teleparse(terminal(ParenOpen = "(", ParenClose = ")"))] Paren, } #[derive_syntax] #[teleparse(root)] struct E { first: T, rest: Eprime, } // Eplus has to be a separate struct because of recursion when // computing trait satisfaction. See chapter 3.2 Note on Recursion for more details #[derive_syntax] struct Eprime(tp::Option<Eplus>); #[derive_syntax] struct Eplus { op: OpAdd, _t: T, rest: Box<Eprime>, } #[derive_syntax] struct T { first: F, rest: Tprime, } #[derive_syntax] struct Tprime(tp::Option<Tstar>); #[derive_syntax] struct Tstar { op: OpMul, _f: F, rest: Box<Tprime>, } #[derive_syntax] enum F { Ident(Ident), Paren((ParenOpen, Box<E>, ParenClose)), } #[test] fn main() -> Result<(), teleparse::GrammarError> { let source = "(a+b)*(c+d)"; let _ = E::parse(source)?; Ok(()) }