...
The following sections are some terse notes to remind myself later when I implement this stuff.
Grammar composition
In v2, we used an inheritance mechanism that was really a glorified dumb include. After discussing with the number of people including Ari Steinberg (who has a lot of experience with large SQL grammars), I have formulated the following mechanism with Kay. The mechanism is based on the idea of delegation rather than inheritance, however, rephrase it as an import that pulls in all rules making them available to the grammar that imports them.
Parsers
Imagine a simple grammar with three rules:
...
No Format |
---|
parser grammar Java;
import JavaDecl;
prog : decl ;
type : 'int' | 'float' ;
|
ANTLR will aggressively optimize out the rules that are not needed. It must still include rules whose lookahead DFA change as a result of an overridden rule. In this case, the change in rule type alters the lookahead prediction for rule decl. Consequently, decl must be included in the generated code for grammar Java. Here is the output ANTLR would generate for Java:
No Format |
---|
class JavaParser extends Parser { JavaDecl delegate = new JavaDecl(...); // probably set in ctor actually public void type() { ... } public void prog() { decl(); } // uses overridden version. public void decl() { int alt = predict-alt-of-decl; // DFA changed; must copy whole rule here switch (alt) { case 1 : type(); match(ID); match(';'); case 2 : type(); match(ID); init(); match(';'); } } void init() { delegate.init(); } } |
...
Notice that you cannot use combined grammars for this; lexers have to be handled with a separate delegation.
Lexers
Lexer composition works exactly the same way. The import in the abstract copies all of the rules into the new lexer:
No Format |
---|
lexer grammar L;
import S,T;
|
Precedence of rules for the same name is given to the grammar listed first in the import statement; e.g., ID, INT, WS, etc... All rules would be copied in because there's no way to optimize out which ones you want.
We will add the ability to specify the implicitly-defined Tokens rule so that you can specify only the set of tokens you want from the merged lexers.
Propogating grammar changes
...