Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Ok, Kay Roepke is in town and we've been discussing the faster expression parsing, among other things. Look for another entry on default StringTemplate generation or a parsing and tree parsing.

Found a ref to Keith Clarke's original recursive-descent precedence work

More on precedence climbing

ANTLR v3.2 will allow special rules for specifying expressions that are particularly efficient both in speed and space. Special rules will define either unary suffix operators, binary operators, or trinary operators by virtue of how they recurse. Precedence of operators is specified by the order in which the alternatives appear in the special rule. Left associativity is assumed. Right associativity is done with a token level option, TOKEN<associativity=right>. The alternatives that are not specifying operators or that specify unary operators are grouped into a new rule. A special rule is rewritten into one that does not have left recursion. Semantic predicates are used to recurse more or less depending on the precedence of the next operator (first symbol of lookahead). Here is the special rule:

e : parse_expr[0] ; // rewrite e to this rule, which invokes the precedence engine

/** "Precedence climbing" engine
parse_expr[int p]
   :   primary
       (   {prec[input.LA(1)]>=p}?=>     bop r= parse_expr[nextp(p)]
       |   {postprec[input.LA(1)]>=p}?=> suffix
       )*
   ;

where rule suffix is described below (basically we remove e references from the left edge). If there are no suffix operators, the 2nd alt of the (...)* is not generated.

If you specify a rule with option parser=precedence, ANTLR does the new magical thing:

e
options {parser=precedence;}
    :   primary  // highest precedence
    |   e '++'
    |   e '.' ID // higher than array/method call
    |   suffix
    |   ('++'|'--') e
    |   '(' type ')' e // cast
    |   e '*' e
    |   e ('+'|'-') e
    |   e '='<associativity=right> e  // lowest precedence
    ;

suffix
    :   e '[' e ']'
    |   e '(' e (',' e)* ')'
    ;

primary
    :   '(' e ')'
    |   INT
    |   ID
    ;

Within the special rule, ANTLR looks for binary operators and the suffix operators. It also looks one level deep into rules referenced from the special rule. In this case, ANTLR identifies '*' '+' '-' '=' as binary operators and '.' '[' '(' as suffix operators. All other alternatives are lumped together into a new rule:

parse_expr_alts
options {backtrack=true;}
    :   primary
    |   ('++'|'--') e
    |   '(' type ')' e // cast
    ;

The backtrack option is specified in case it is needed, which is in this case because of the cast versus parenthesized expression in primary. Ooops. Just realized that the order of the alternatives is wrong. The primary will match the parenthesized expression before trying to cast. we need a way to figure this out.

The suffix rule has left recursive references to e sell those must be removed. The parse_expr rule invokes a suffix after having matched e already:

suffix
    :  '[' e ']'
    |  '(' e (',' e)* ')'
    ;

Affect on lookahead

ANTLR can only compute LL(1) lookahead sets for these special rules. So, rules that need to look into expressions might have to backtrack:

s : e ';' // nondeterministic; must use option backtrack=true!!!
  | e '!'
  ;

Tree construction and actions

ANTLR can allow actions at the end of binary operator alternatives as well as anywhere else in the other rules. The binary operators are all combined into the parse_expr rule so ANTLR must collect all actions on the right-hand side and make a "switch on the operator" action within the parse_expr loop:

parse_expr[int p]
   :   primary
       (   {prec[input.LA(1)]>=p}?=>     bop r= parse_expr[nextp(p)]
           {
           switch ( $bop.start.getType() ) {
               case '+' :
               case '-' :
                  action-for-this-alt
                  break;
               ...
               case '=' :
                  action-for-this-alt
                  break;
               default : // no action
           }
           }
       |   {postprec[input.LA(1)]>=p}?=> suffix
       )*
   ;

Trees for the binary operators could also be automatically done using the standard ANTLR mechanisms.

parse_expr[int p]
   :   (primary->primary)
       (   {prec[input.LA(1)]>=p}?=>     bop r=parse_expr[nextp(p)] -> ^(bop $parse_expr $r)
       |   {postprec[input.LA(1)]>=p}?=> suffix[$e.tree]            -> {$suffix.tree}
       )*
   ;

The other alternatives and rules create trees as usual:

    |   primary
    |   ('++'^|'--'^) e
    |   '(' e ')' -> e

Rule suffix is left alone except for the removal of the left recursion so you can add whatever tree construction you want but you must allow a $lhs parameter to come in. This parameter is the primary matched immediately to the left.

suffix[CommonTree lhs]
   :   t='[' e ']'             -> ^(INDEX[$t] {$lhs} e)
   |   t='(' (e (',' e)*)? ')' -> ^(CALL[$t] {$lhs} e*)
   |   t='++'                  -> ^(POSTINC[$t] {$lhs})
   ;

Implementation

Have ANTLR build parse_expr rules as templates and then call ANTLRParser.rule() on that text. It will be like an include. This is the mechanism I use for nextToken generation.

It's possible that I could make the precedence tables and the parse_expr rule itself members of the Parser superclass.

  • No labels