Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A tree grammar is nothing more than a formal specification of the tree structure created by parser grammar. From this tree grammar, I generate an external visitor (See Language implementation patterns). People always ask if there is an automated way to create tree grammar from the tree construction instructions sitting in the parser grammar. It's a pretty hard task given the complexity of the rewrite instructions and so on. I was thinking that I could do something that observed the trees you created and combined it with knowledge of the grammar to produce a tree grammar. Sounded pretty complicated. But, that got me thinking about using the structure implied by the grammar to automatically produce high-level visitors (i.e., visitors that don't simply tell you that they found an ID node, which is pretty worthless). What we want is to match phrases per what you had in the grammar. Oliver Zeigermann and I have been tossing some ideas around, which also led to my thoughts on this blog page.

Newbie mode

What if we had a newbie mode that literally did not expose an internal representation (tree or otherwise). Let's face it. most of the time all we want to do is parse something and then visit the elements multiple times. We'd reparse all of the time if it weren't so slow and it didn't sound so unsophisticated. Also, trees or some other internal data structure are really great for keeping little pieces of information. We like heterogeneous tree nodes because we can have different information per node. To replicate that in a visitor, we could simply deconstruct the nodes or whatever the internal representation is, and simply send that information back to the visitor as parameters of a function. No need to have any of this exposed to the user. Here is a simple expression grammar thingie:

...

Panel

From looking at the work of my friends, Jurgen Vinju (ASF+SDF) and Eelco Visser (Stratego/XT), there are a couple of very interesting lessons:

  • you don't always have to care what the internal structure of the input is
  • when doing transformations, using concrete syntax like "x+x -> 2*x" is useful; probably implies a parse tree as the internal structure
  • it's a great idea to separate how we walk the input from what we do while discovering nodes or phrases.

The negatives

I can imagine this being less efficient because I would likely build a parse tree instead of an AST. Parse trees have internal nodes that represent rule applications/invocations, but for many applications we care more about the relationship between the tokens than we do about which rules were used to match them. So, these internal nodes are a waste. What we really want to do is to create operator-operand trees such as (+ 3 4) where + is at the root and has 2 children, 3 and 4. Power users will still want the ability to create their own ASTs and tree grammars / tree filters for maximum efficiency and control.

...

If we want to do transformations, however, we can use concrete syntax which hides not only the internal structure of the tree but does not have any visitors that programmers can modify. All the functionality is extracted from the side of "this goes to that" transformations. Of course, this only works when you're doing translation. If you need to trigger calls from the configuration file into your massive server, there's no escaping the fact that we need to execute code snippets. Where to specify and how to specify those snippets is the key question.

Pattern-based visitors

Maybe we should borrow a page from the term rewriting guys and use patterns instead of special method names. For example, we could specify contexts (rules) and then patterns followed by actions to execute if those are seen:

...

The pattern-based approach seems easier and, because it uses concrete syntax, could be easier to use.

Java annotations approach

I wonder if there's a way to use annotations to specify patterns of interest to avoid having to run ANTLR or some other tool on those patterns or rules to get the visitor. (This won't help the other languages but Java is probably the largest market for ANTLR, so I will investigate this option).

...

I can see that the names of the parameters could be an issue because changes to the labels in the grammar would force changes in the method signatures. I guess at least we would know about it because the annotations.

Mixed-mode rules and listener triggers

Finally, it's probably worth exploring a simpler and more direct approach: adding triggers to the original grammar but having ANTLR actually invoke them during its traversal of an internal data structure.

...