...
- Island grammars (probably best handled by a scannerless parser); ignored for the purposes of this discussion
- Combining and sharing grammars (multiple variations on C, SQL, etc...); good also on a practical level to reduce size of any one particular generated file.
- Deriving a new grammar from an existing standard grammar such as Java.g. Changes to prototype grammar should be pulled into derived grammar.
- n-phase translators with a single prototypical tree grammar. Changes to the prototype should be "pushed" to all derivative grammars.
The following sections are some terse notes to remind myself later when implement this stuff.
Grammar composition
In v2, we used an inheritance mechanism that was really a glorified dumb include. After discussing with the number of people including Ari Steinberg (who has a lot of experience with large SQL grammars), I have formulated the following mechanism with Kay. The mechanism is based on the idea of delegation rather than inheritance, however, rephrase it as an import that pulls in all rules making them available to the grammar that imports them. Imagine a simple grammar with three rules:
No Format |
---|
parser grammar JavaDecl;
type : 'int' ;
decl : type ID ';'
| type ID init ';'
;
init : '=' INT ;
|
Now imagine that you want to build another grammar by reusing some of the rules and that grammar. You can use an import statement that at least in the abstract includes all the rules from the other grammar that are not overridden:
No Format |
---|
parser grammar Java;
import JavaDecl;
type : 'int' | 'float' ;
|
ANTLR will aggressively optimize out the rules are not needed. It must still include rules whose lookahead DFA change as a result of an overridden rule. In this case, the change in rule type alters the lookahead prediction for rule decl. Consequently, decl must be included in the generated code for grammar Java. Here is the output ANTLR would generate for Java:
No Format |
---|
class JavaParser extends Parser {
JavaDecl delegate = new JavaDecl(...); // probably set in ctor actually
public void type() { ... }
public void decl() {
int alt = predict-alt-of-decl;
switch (alt) {
case 1 :
type(); match(ID); match(';');
case 2 :
type(); match(ID); init(); match(';');
}
}
void init() { delegate.init(); }
}
|
The code generation would look the same and delegation would be handled by having a rule with the proper name that delegated to the other grammar; e.g., init(). This is nice because it shows you the list of all rules delegated to other grammars.
Notice that you cannot use combined grammars for this.
No Format |
---|
Propogating grammar changes
...