ooops; 2/19/2011, i note that "sits between +/- binary and * binary ops" is wrong.
I should be working on something else but got to thinking about how annoying it is specifying expressions in recursive descent parsers. You have to have a new rule for each precedence level. This is also very slow. Just to match 34 it has to descend about 15 method calls. I built a prototype single-rule (plus primary and suffix) operator matching thingie which I enclose below. I should be able to generate it from some metameta syntax in antlr. For example, this is what I would really like to specify when doing expressions rather than the 15 or so rules to define the precedence levels:
No Format |
---|
e : primary | primary '^' e // right associative | e '++' | e '.' ID | suffix | ('++'|'-') e // right assoc | suffixbut how? | e '*' e | e ('+'|'-') e ; suffix : e '.' ID | e '[' e ']' | e '(' e (',' e)* ')' ; primary : '(' e ')' | INT | ID ; |
The beauty of this is that it would automatically know how to build ASTs (or with a little bit of help from you for the complicated ones like method calls and array indexing). Further, the precedence specified as usual by the order of the alternatives. Anything that starts and stops with e
is considered a binary (or possibly trinary) operator. Any other alternative that starts with e
is a suffix operation. Any other alternative that ends with e
is a unitary unary operation. Alts that are simply rule references identify groups of operations at the same level such as the suffix rule.
...
No Format |
---|
e : primary | exponent | e '++' | ('++'|'-') e | e '.' ID // higher than array/method call | suffix | unary | e '*' e | e ('+'|'-') e ; unary options {associativity = right;} : '++' e | '-' e ; exponent options {associativity = right;} : e '^' e ; suffix : e '[' e ']' | e '(' e (',' e)* ')' ; primary : '(' e ')' | INT | ID ; |
...
Another really cool feature is that none of the target code generators would have to change because all of this happens sort of like a macro preprocessor within ANTLR.
Issues
Associativity...token option?
What about casts:
No Format |
---|
e : '(' type ')' e
| ...
;
|
The mechanism relies on being able to text the next token as the operator...would checking '(' be unique? Might conflict with function call '('.