Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Because most ANTLR users don't build compilers, I decided to focus on the other applications for ANTLR v4: parsing and extracting information and then translations. For compilers, we need to convert everything into operations and operands--that means ASTs are easier. For example, 3+4 should become a tree with + at the root and 3 and 4 as operands/children: (+ 3 4). The parse tree in contrast is probably (expr 3 + 4) where rule reference expr is the root node and the other elements are children. The parse tree is indeed less useful for compiler work.

On the other hand, parse trees are usually what we want for data extraction and translation tasks. Moreover, the parse trees can be automatically generated and ANTLR can generate listener and visitor tree walker mechanisms automatically as well. Users of ANTLR v4 don't have to learn the AST construction syntax nor the or AST tree grammar syntax and semantics.

ANTLR itself is a very complicated translator. It reads in grammar files and generates multiple Java files as output. The translation is complicated by the fact that the output looks very different from the input and the order of the output can be very different from the order the input. There can be nonlocal transformations. For example, ANTLR can decide it needs a local variable or parameter deep within a rule.  For example  Token reference x=T defines a label x that ANTLR would translate as something like:

...

The point is that ANTLR does all this without gradually transforming a grammar AST into a Java AST. That would be extremely cumbersome from lots of personal experience. Instead, ANTLR walks the tree multiple times to extract information and also to annotate the nodes. Once all of the information it needs is available, it does another walk over the tree and builds up a model of the output. The output objects are built up as the information becomes available during the tree walk of the input tree. The result of model construction is a complete tree of output model objects, which must be then rendered to text via StringTemplate templates.   The name of the object model class, such as RuleFunction, corresponds to a template with the same name that knows how to render that model object to text. I built an automatic walker that walks the model and builds up the giant output template. The final step is to ask StringTemplate to render that template the text, effectively generating code. It works great!

...