Non-macro library

In the non-macro part of the cflp library, we have two structs (Error and NodeWrapper), and two traits (Parser and NodeData)

Traits

Parser

pub trait Parser<I, C: PartialEq<C>, R = Self> {
	fn parse<T>(src: &mut T) -> Result<R, Error<I, C>> where
            T: Iterator<Item=I> + Clone,
            Self: Sized {
        <Self as Parser<I, C, R>>::parse_with_recursion(false)
    }
    
    fn parse_with_recursion<T>(src: &mut T) -> Result<R, Error<I, C>> where
            T: Iterator<Item=I> + Clone,
            Self: Sized;
}

The Parser trait is used to parse a token stream (of type T) into either a Self instance, or fail to do so and return an error.

You should only ever use Parser::parse. Parser::parse_with_recursion is where all the actual derived code goes. Without parse_with_recursion, we can end up with stack overflows. parse_with_recursion should prevent zero-consumption branching.

NodeData

pub trait NodeData<T: Sized> {
	fn start(&self) -> T;
	fn end(&self) -> T;
}

The NodeData trait is used to obtain positional data about a node. This is intended for tokens used in the Parser trait, but can be applied to parsed nodes as well.

Structs

Error

pub struct Error<F, E> {
	pub found: Option<F>,
	pub expected: E
}

The Error struct is returned from Parser::parse when parsing fails. The expected field is for the element expected, and the found field is the element found. This is an Option because the iterator could return a None.

Here we'll introduce simple rule syntax. This will include matching types, optional sections, Kleene closures, and positive closures.

We'll be using the following struct as our example token

pub enum Token {
    OB, CB,     // Opening and closing parenthesis
    Value(u8),  // A literal value
    Other(char) // Some character
}

Matching something

If you want to match a token, add the token you want to match to the rule. Simple as that. For example if we wanted to match Token::OB in our rule, we'd have (RuleName; Token::OB).

Groups

A group is a set of rules inside some parentheses. These can be complex nested groups with lots of modifiers applied to each inner group, or just contain a single value such as (Token::Value(2)).

Optional sections

An optional section is one that can optionally be included in the input. The parser is a maximal-munch parser and will always try to match these optional sections.

To indicate an optional section, you add a ? to the end of a group for example (Token::Other('a'))?.

Kleene and positive closures

A Kleene closure is one that will match the inner section 0 or more times, and a positive closure will match the inner section at least once. These are indicated in a similar way to optional sections but with a * for a Kleene closure and + for a positive closure.


Zero-consumption branching

Consider the following:

#[derive(PartialEq, PartialEqRef)]
enum Token {
    Add,
    Num(i64),
    Str(String)
}

#[derive(Parser)]
#[parser(Token, Token)]
enum Expr {
    #[parser([[@Expr]], Token::Add, [[@Expr]])]
    Add {
        left: Box<Self>,
        right: Box<Self>
    },
    #[parser([@ExprLit])]
    Literal(ExprLit)
}

#[derive(Parser)]
#[parser(Token, Token)]
enum ExprLit {
    #[parser([Token::Num(t)])]
    Number(i64),
    #[parser([Token::Str(t)])]
    String(String)
}

This can lead to zero-consumption branching if we call Expr::parse, so-called because it consumes zero elements in src and branches the program. This is because it will try to first parse an Expr, leading to it trying to parse an Expr etc. resulting in a stack overflow. To prevent this we instead call Parser::parse_with_recursion(src, false) which is expected to prevent this call from trying to parse itself again.

This should not prevent calling Parser::parse for Self once something has been consumed.

If this is in your code, you will need to ensure that C: Default. This can be done for an enum by re-using one of the variants, since this is only required to have some value returned to prevent a stack overflow. See example 3 for an example of this.