Derive macro

When using the derive macro to derive the Parser trait, you need to specify how the parser should be implemented. This is done using helper attributes (#[parser(...)]).

Metadata

Metadata is just three arguments (two types and one expression) that are required to know how to derive the Parser impl. They're comma seperated (no trailing comma) and in the following order:

  1. Input token type I
    This is the type in the iterator (behind a reference). For example if our input token type was MyToken, the iterator would be T: Iterator<Item=&MyToken>
  2. Comparison token type C
    This is the type being compared with. If your input token type contains token metadata (such as token position), then you will probably want to compare the underlying token type. Using the example from before, if we had MyToken::type_: MyTokType where MyTokType is an enum of token types, we would probably have our comparison token type as MyTokType

IMPORTANT: If your input and comparison types are different, then &I: Into<C> must be satisfied

Rule syntax

Rules are defined as two parts; values and groups. Values are singular expressions, and groups are a group of values.

Values

Consider the following rule from the first example:

#[derive(Parser)]
#[parser(
    Token, Token;
    ([@Value])+, [@Sep], ([@Value])*, (@Sep)?)
]
pub struct Base {
    first: Vec<Value>,
    sep: Sep,
    last: Vec<Value>
}

The first line in the helper attribute (line 3) is the metadata. The line after defines the rule for the parser impl. We'll consider this in segments:

  1. ([@Value])+
    Firstly, @Value is a call to another rule, Value in this case. This will try to parse a Value type. Placing square brackets around this means that we save the match. The parentheses makes it a group and the + makes the group into a positive closure, meaning we match at least one Value
  2. [@Sep]
    This saves a Sep type that's matched the input
  3. ([@Value])*
    This is the same as ([@Value])+ but for a kleene closure, meaning we match zero or more Value types
  4. (@Sep)?
    @Sep will match a Sep type. The ? after the group means that this is optional, so it can either be matched or not

When saving values, you might need to box a value to ensure that the type has a known size at compile time. If this is required, you can use two square brackets to indicate as such. For example, [[@Sep]] would match a Sep type and box the result (Box<Sep>)

Groups

A group is a set of values in parentheses. It can optionally be followed by a ?, *, or + for an optional group, kleene closure, and positive closure:

  • Optional means it will try to be parsed. If it's not, then it resets the input iterator back to where it was before the group, and skips it. This becomes an Option<T> where T is the type matched by the inside of the group
  • Kleene closure means it matches any number (including zero) of the group. This becomes a Vec<T> where T is the type matched by the inside of the group
  • Positive closure means it matches at least one of the group. This has the same type as a kleene closure

Groups can have comma seperated values. For example you could have (TokType::Int(1), TokType::Add, TokType::Int(2))? as a group. Groups can be nested

Enums and unwrapping matches

Util now, all the examples have been on structs. Enums are slightly different. The enum needs the metadata in a parser attribute on the enum itself, with the rule for each enum variant in an parser attribute on the variant.

For example, we might have a simple enum to turn token literals into AST node literals

#[derive(Parser)]
#[parser(Token, Token)]
pub enum Value {
    #[parser([Token::Digit(t)])]
    Int(usize),
    #[parser([Token::Ident(t)])]
    Ident(String)
}

You'll notice that the matches on the variants don't have literal values, but identifiers. This is because matches that aren't calls to other types are parsed as patterns, like in a match statement arm. The identifiers are matches just like in a match statement and used, instead of matching the whole token.

When picking identifiers, make sure they're all unique. It's also a good idea to avoid identifiers with a double underscore prefix such as __my_var since double underscore prefixed variables are used in the derived implementation and it may allow for things to get messed up.