Derive macro
When using the derive macro to derive the Parser
trait, you need to specify how the parser should be implemented. This is done using helper attributes (#[parser(...)]
).
Metadata
Metadata is just three arguments (two types and one expression) that are required to know how to derive the Parser
impl. They're comma seperated (no trailing comma) and in the following order:
- Input token type
I
This is the type in the iterator (behind a reference). For example if our input token type wasMyToken
, the iterator would beT: Iterator<Item=&MyToken>
- Comparison token type
C
This is the type being compared with. If your input token type contains token metadata (such as token position), then you will probably want to compare the underlying token type. Using the example from before, if we hadMyToken::type_: MyTokType
whereMyTokType
is an enum of token types, we would probably have our comparison token type asMyTokType
IMPORTANT: If your input and comparison types are different, then &I: Into<C>
must be satisfied
Rule syntax
Rules are defined as two parts; values and groups. Values are singular expressions, and groups are a group of values.
Values
Consider the following rule from the first example:
#[derive(Parser)]
#[parser(
Token, Token;
([@Value])+, [@Sep], ([@Value])*, (@Sep)?)
]
pub struct Base {
first: Vec<Value>,
sep: Sep,
last: Vec<Value>
}
The first line in the helper attribute (line 3) is the metadata. The line after defines the rule for the parser impl. We'll consider this in segments:
([@Value])+
Firstly,@Value
is a call to another rule,Value
in this case. This will try to parse aValue
type. Placing square brackets around this means that we save the match. The parentheses makes it a group and the+
makes the group into a positive closure, meaning we match at least oneValue
[@Sep]
This saves aSep
type that's matched the input([@Value])*
This is the same as([@Value])+
but for a kleene closure, meaning we match zero or moreValue
types(@Sep)?
@Sep
will match aSep
type. The?
after the group means that this is optional, so it can either be matched or not
When saving values, you might need to box a value to ensure that the type has a known size at compile time. If this is required, you can use two square brackets to indicate as such. For example,
[[@Sep]]
would match aSep
type and box the result (Box<Sep>
)
Groups
A group is a set of values in parentheses. It can optionally be followed by a ?
, *
, or +
for an optional group, kleene closure, and positive closure:
- Optional means it will try to be parsed. If it's not, then it resets the input iterator back to where it was before the group, and skips it. This becomes an
Option<T>
whereT
is the type matched by the inside of the group - Kleene closure means it matches any number (including zero) of the group. This becomes a
Vec<T>
whereT
is the type matched by the inside of the group - Positive closure means it matches at least one of the group. This has the same type as a kleene closure
Groups can have comma seperated values. For example you could have (TokType::Int(1), TokType::Add, TokType::Int(2))?
as a group. Groups can be nested
Enums and unwrapping matches
Util now, all the examples have been on structs. Enums are slightly different. The enum needs the metadata in a parser attribute on the enum itself, with the rule for each enum variant in an parser attribute on the variant.
For example, we might have a simple enum to turn token literals into AST node literals
#[derive(Parser)]
#[parser(Token, Token)]
pub enum Value {
#[parser([Token::Digit(t)])]
Int(usize),
#[parser([Token::Ident(t)])]
Ident(String)
}
You'll notice that the matches on the variants don't have literal values, but identifiers. This is because matches that aren't calls to other types are parsed as patterns, like in a match statement arm. The identifiers are matches just like in a match statement and used, instead of matching the whole token.
When picking identifiers, make sure they're all unique. It's also a good idea to avoid identifiers with a double underscore prefix such as __my_var
since double underscore prefixed variables are used in the derived implementation and it may allow for things to get messed up.