Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

The following sections are some terse notes to remind myself later when I implement this stuff.

Grammar composition

In v2, we used an inheritance mechanism that was really a glorified dumb include. After discussing with the number of people including Ari Steinberg (who has a lot of experience with large SQL grammars), I have formulated the following mechanism with Kay. The mechanism is based on the idea of delegation rather than inheritance, however, rephrase it as an import that pulls in all rules making them available to the grammar that imports them.

Parsers

Imagine a simple grammar with three rules:

...

No Format
parser grammar Java;
import JavaDecl;
prog : decl ;
type : 'int' | 'float' ;

ANTLR will aggressively optimize out the rules that are not needed. It must still include rules whose lookahead DFA change as a result of an overridden rule. In this case, the change in rule type alters the lookahead prediction for rule decl. Consequently, decl must be included in the generated code for grammar Java. Here is the output ANTLR would generate for Java:

No Format
class JavaParser extends Parser {
  JavaDecl delegate = new JavaDecl(...); // probably set in ctor actually
  public void type() { ... }
  public void prog() { decl(); } // uses overridden version.
  public void decl() {
    int alt = predict-alt-of-decl; // DFA changed; must copy whole rule here
    switch (alt) {
    case 1 :
      type(); match(ID); match(';');
    case 2 :
      type(); match(ID); init(); match(';');
    }
  }
  void init() { delegate.init(); }
}

...

Notice that you cannot use combined grammars for this; lexers have to be handled with a separate delegation.

Lexers

Lexer composition works exactly the same way. The import in the abstract copies all of the rules into the new lexer:

No Format

lexer grammar L;
import S,T;

Precedence of rules for the same name is given to the grammar listed first in the import statement; e.g., ID, INT, WS, etc... All rules would be copied in because there's no way to optimize out which ones you want.

We will add the ability to specify the implicitly-defined Tokens rule so that you can specify only the set of tokens you want from the merged lexers.

Propogating grammar changes

...