Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Update Oct 2012: resolved with correct definition of when to terminate prediction lookahead. Turns out we don't need to push this extra stack element.

Over the Christmas holidays, I've been busy building example grammars for ANTLR v4. The thing I noticed immediately is that grammars just work. There are no error messages from ANTLR when generating code and all we can get are true ambiguity errors at runtime. E.g., if you can recognize T(i) as both a function call and a constructor call. The lexers work extremely well and I'm a happy guy. Nothing scares ANTLR v4, "it's pretty bad-ass" like the The Crazy Nastyass Honey Badger.

...

Imagine we have R expression a(i)<-x that we want to parse starting at rule prog in the following grammar.

Code Block

prog:   expr_or_assign* ;

expr_or_assign
    :   expr '++'
    |   expr    
    ;
  
expr : expr_primary ('<-' ID)? ;

expr_primary
    : '(' ID ')'
    | ID '(' ID ')'
    | ID
    ;

...

After much experimentation and creation of software to trace prediction through the augmented transition network (ATN) representation of grammar, I discovered the source of the problem. To highlight the issue, all we have to do is use the recursive equivalent of prog:

Code Block

prog:   expr_or_assign prog | ; 

Or, for simplicity, the following manual repetition of expr_or_assign exhibits the same behavior for our input "a(i)<-x".

Code Block

prog:   expr_or_assign expr_or_assign ;

...

The problem is that ANTLR destroys context information when it converts the recursive looping rule into the equivalent EBNF expr_or_assign* loop. Imagine returning from rule expr_or_assign in this version of prog:

Code Block

prog:   expr_or_assign expr_or_assign ;

...