Non-macro library
In the non-macro part of the cflp library, we have two structs (Error
and NodeWrapper
), and two traits (Parser
and NodeData
)
Traits
Parser
pub trait Parser<I, C: PartialEq<C>, R = Self> {
fn parse<T>(src: &mut T) -> Result<R, Error<I, C>> where
T: Iterator<Item=I> + Clone,
Self: Sized {
<Self as Parser<I, C, R>>::parse_with_recursion(false)
}
fn parse_with_recursion<T>(src: &mut T) -> Result<R, Error<I, C>> where
T: Iterator<Item=I> + Clone,
Self: Sized;
}
The Parser
trait is used to parse a token stream (of type T
) into either a Self
instance, or fail to do so and return an error.
You should only ever use Parser::parse
. Parser::parse_with_recursion
is where all the actual derived code goes. Without parse_with_recursion
, we can end up with stack overflows. parse_with_recursion
should prevent zero-consumption branching.
NodeData
pub trait NodeData<T: Sized> {
fn start(&self) -> T;
fn end(&self) -> T;
}
The NodeData
trait is used to obtain positional data about a node. This is intended for tokens used in the Parser
trait, but can be applied to parsed nodes as well.
Structs
Error
pub struct Error<F, E> {
pub found: Option<F>,
pub expected: E
}
The Error
struct is returned from Parser::parse
when parsing fails. The expected
field is for the element expected, and the found
field is the element found. This is an Option
because the iterator could return a None
.
Here we'll introduce simple rule syntax. This will include matching types, optional sections, Kleene closures, and positive closures.
We'll be using the following struct as our example token
pub enum Token {
OB, CB, // Opening and closing parenthesis
Value(u8), // A literal value
Other(char) // Some character
}
Matching something
If you want to match a token, add the token you want to match to the rule. Simple as that. For example if we wanted to match Token::OB
in our rule, we'd have (RuleName; Token::OB)
.
Groups
A group is a set of rules inside some parentheses. These can be complex nested groups with lots of modifiers applied to each inner group, or just contain a single value such as (Token::Value(2))
.
Optional sections
An optional section is one that can optionally be included in the input. The parser is a maximal-munch parser and will always try to match these optional sections.
To indicate an optional section, you add a ?
to the end of a group for example (Token::Other('a'))?
.
Kleene and positive closures
A Kleene closure is one that will match the inner section 0 or more times, and a positive closure will match the inner section at least once. These are indicated in a similar way to optional sections but with a *
for a Kleene closure and +
for a positive closure.
Zero-consumption branching
Consider the following:
#[derive(PartialEq, PartialEqRef)]
enum Token {
Add,
Num(i64),
Str(String)
}
#[derive(Parser)]
#[parser(Token, Token)]
enum Expr {
#[parser([[@Expr]], Token::Add, [[@Expr]])]
Add {
left: Box<Self>,
right: Box<Self>
},
#[parser([@ExprLit])]
Literal(ExprLit)
}
#[derive(Parser)]
#[parser(Token, Token)]
enum ExprLit {
#[parser([Token::Num(t)])]
Number(i64),
#[parser([Token::Str(t)])]
String(String)
}
This can lead to zero-consumption branching if we call Expr::parse
, so-called because it consumes zero elements in src
and branches the program. This is because it will try to first parse an Expr
, leading to it trying to parse an Expr
etc. resulting in a stack overflow. To prevent this we instead call Parser::parse_with_recursion(src, false)
which is expected to prevent this call from trying to parse itself again.
This should not prevent calling Parser::parse
for Self
once something has been consumed.
If this is in your code, you will need to ensure that C: Default
. This can be done for an enum by re-using one of the variants, since this is only required to have some value returned to prevent a stack overflow. See example 3 for an example of this.