Spent a few hours talking to Kay Roepke as he is in town for 10 days. He and I started looking at the tree diff mechanism he found on the net and we discussed how it would be included into antlr. Also we discussed grammar composition. Seems to me there are four problems in the area of reusing grammars:
- Island grammars (probably best handled by a scannerless parser); ignored for the purposes of this discussion
- Combining and sharing grammars (multiple variations on C, SQL, etc...)
- Deriving a new grammar from an existing standard grammar such as Java.g. Changes to prototype grammar should be pulled into derived grammar.
- n-phase translators with a single prototypical tree grammar. Changes to the prototype should be "pushed" to all derivative grammars.
The following sections are some terse notes to remind myself later when implement this stuff.
Grammar composition
Propogating grammar changes
Basically, we will need tree-based grammar versions of diff and diff3: gpatch, gdiff, gdiff3. Also, we will need a tool called gderive that effectively does a copy of an original grammar into a special location, ~/.grammar-prototypes
, so that later gsync calls will know what the original looked like.
When propagating the change, there is a possibility of collision. The user might have altered an action, or inserted an action into a location that the original grammar changes. Collisions in actions or any change within an alt that has actions
in proto or derived causes a marker in the merged file. ANTLR could even look for that cookie and warn that the grammar may not work.
Branching off a new version of an existing grammar
Let's say you want to do a project using the existing Java.g grammar. First, make sure that you make a copy of the original grammar with:
gderive Java.g MyJava.g
Then make whatever changes you want to the grammar or actions. When you want to get fixes from the original Java.g, all you need to do is the following:
gsync Java.g MyJava.g
N-phase translators with a single prototypical tree grammar
Imagine a collocated translator that has multiple passes over the tree to collect information and build ancillary data structures. Changes to the tree construction in the parser grammar, affect the underlying grammar of all phases. This is a change to manually propagate the changes to each of the tree grammars and can introduce lots of errors. Instead, we should build a single prototypical tree grammar and keep it around (not up in the gderive directory though). Later, all of the phases can be changed simply by altering the proto.g tree grammar and invoking:
gsync proto.g p1.g ... p2.g
That effectively pushes all of the changes to the different phases. Collisions result in a marker in a grammar file.