Operator precedence parser

In grammars that have a large number of operator precedence levels for binary expressions, the time spent in the standard ANTLR recursive approach to expression parsing can have a noticeable impact on the overall speed of the grammar. One option to consider in writing the binary expression parser is embedding an operator precedence parser for binary expressions between the 'expression' (or similar) rule and the unary expression rule(s). For the time being, I don't have an example implementation for additionally handling ternary and unary expressions, but those can be written in the same recursive manner as usual, since they are the minority of the time spent recursing the expression rules.

Benefits:

  • When the binary expression rules don't use predicates, this can be an order of magnitude faster than the common recursive method.
  • Operator associativity (left to right / right to left) is handled in a concise and clear way.
  • Operator precedence is clear and concise, and properly handles multiple operators at the same level, such as * and / in C++

Downsides:

  • Requires some implementation in the target language.

Here is a very simple example grammar that implements this method. The methods are written in Java here (which I tested), but I use C# for my production grammars and the code (minor changes of course) runs great there too.

grammar ExpressionParser;

options
{
        output=AST;
}

tokens
{
        PLUS = '+';
        MINUS = '-';
        TIMES = '*';
        DIVIDE = '/';

        LPAREN = '(';
        RPAREN = ')';

        AST_EXPR;
}

@parser::members
{
        boolean isRightToLeft( int type )
        {
                // return true here for any operators that are right-to-left associative
                return false;
        }
        int getOperatorPrecedence( int type )
        {
                switch ( type )
                {
                case TIMES:
                case DIVIDE:
                        return 1;
                case PLUS:
                case MINUS:
                        return 2;
                default:
                        return 0; // really this shouldn't be hit
                }
        }
        int findPivot( List operators, int startIndex, int stopIndex )
        {
                int pivot = startIndex;
                int pivotRank = getOperatorPrecedence( ((Token)operators.get(pivot)).getType() );
                for ( int i = startIndex + 1; i <= stopIndex; i++ )
                {
                        int type = ((Token)operators.get(i)).getType();
                        int current = getOperatorPrecedence( type );
                        boolean rtl = isRightToLeft(type);
                        if ( current > pivotRank || (current == pivotRank && rtl) )
                        {
                                pivot = i;
                                pivotRank = current;
                        }
                }
                return pivot;
        }
        Tree createPrecedenceTree( List expressions, List operators, int startIndex, int stopIndex )
        {
                if ( stopIndex == startIndex )
                        return (Tree)expressions.get(startIndex);

                int pivot = findPivot( operators, startIndex, stopIndex - 1 );
                Tree root = (Tree)adaptor.nil();
                Object root_1 = (Object)adaptor.nil();
                root_1 = (Object)adaptor.becomeRoot( (Token)operators.get(pivot), root_1 );
                adaptor.addChild( root_1, createPrecedenceTree( expressions, operators, startIndex, pivot ) );
                adaptor.addChild( root_1, createPrecedenceTree( expressions, operators, pivot + 1, stopIndex ) );
                adaptor.addChild( root, root_1 );
                return root;
        }
        Tree createPrecedenceTree( List expressions, List operators )
        {
                return createPrecedenceTree( expressions, operators, 0, expressions.size() - 1 );
        }
}

complete_expression
        :       expression EOF
                -> expression
        ;

primary_expression
        :       INTEGER
        |       '(' expression ')'
                -> expression
        ;

expression
@init
{
        List expressions = new ArrayList();
        List operators = new ArrayList();
}
        :       (       left=primary_expression
                        { expressions.add($left.tree); }
                )
                (       operator
                        right=primary_expression
                        {
                                operators.add($operator.start);
                                expressions.add($right.tree);
                        }
                )*
                -> {createPrecedenceTree(expressions,operators)}
        ;

operator
        :       '+'
        |       '-'
        |       '*'
        |       '/'
        ;

///////////////////////////////////////////
// LEXER

fragment
DIGIT   :       '0'..'9'
        ;

INTEGER :       DIGIT+
        ;

WS      :       (' ' | '\t' | '\n' | '\r')+
                { $channel = HIDDEN; }
        ;