Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

ooops; 2/19/2011, i note that "sits between +/- binary and * binary ops" is wrong.

I should be working on something else but got to thinking about how annoying it is specifying expressions in recursive descent parsers. You have to have a new rule for each precedence level. This is also very slow. Just to match 34 it has to descend about 15 method calls. I built a prototype single-rule (plus primary and suffix) operator matching thingie which I enclose below. I should be able to generate it from some metameta syntax in antlr. For example, this is what I would really like to specify when doing expressions rather than the 15 or so rules to define the precedence levels:

No Format
e   :   eprimary
    |   primary '^' e // right associative
    |   e '*++' e
    |   e '++.' eID
    |   '-'suffix
e     |   e ('++'|'-') e // right assoc  |   suffixbut how?
    |   e '++*' e
    |   e ('+'(|' e -')' e
   | ;

INTsuffix
    |:   IDe '[' e ']'
 ;  suffix |   e :'(' e (',' e)* '.' ID
)'
    ;

primary
   | :  e '[(' e '])'
    |   INT
  e '(' e| (',' e)* ')' ID
    ;

The beauty of this is that it would automatically know how to build ASTs (or with a little bit of help from you for the complicated ones like method calls and array indexing). Further, the precedence specified as usual by the order of the alternatives. Anything that starts and stops with e is considered a binary (or possibly trinary) operator. Any other alternative that starts with e is a suffix operation. Any other alternative that ends with e is a unitary unary operation. Alts that are simply rule references identify groups of operations at the same level such as the suffix rule. Anything else is part of the primary rule. The first thing I would need to do is to identify the various kinds of operations: unary, suffix, and primary

elementsPerhaps as Gavin Lambert pointed out, we simply need to do associativity with an option:

No Format
e   :   primary
  e '^' e| // right associativeexponent
    |   e '*++'
e    |   |e '.' ID unary// higher than array/method call
|   e ('+'|'-') e|   suffix
    |   suffix_unary
    |   suffix e '*' e
    |   e  primary('+'|'-') e
    ;

unary_
options {associativity = right;}
    :  '++' e
    |   '-' e
    ;

exponent
options {associativity = suffix_right;}
    :   e '.^' IDe
    ;

suffix
    :   e '[' e ']'
    |   e '(' e (',' e)* ')'
    |   e '++'
    ;

primary_
    :   '(' e ')'
    |   INT
    |   ID
    ;

Once I had identified these elements, The one funny thing about all this is that we like to see usually things in order from lowest to highest precedence for these kinds of expression rules, but ANTLR assumes everything is highest to lowest precedence.

I could generate something like the following:

...

Another really cool feature is that none of the target code generators would have to change because all of this happens sort of like a macro preprocessor within ANTLR.

Issues

Associativity...token option?

What about casts:

No Format

e : '(' type ')' e
  | ...
  ;

The mechanism relies on being able to text the next token as the operator...would checking '(' be unique? Might conflict with function call '('.