Automatic StringTemplate construction in ANTLR grammars

Currently ANTLR does not create templates for you automatically when you use output=template option. This is because, when I first implemented it, I had no idea what the right answer was here. I did not know how to deal with whitespace and so on. I think I have the answer now. First, let me remind you that output=AST builds a completely flat tree given no instructions to the contrary. Similarly, the template output should reproduce the input given no instructions. Some cases seem obvious. What should the output template be for this rule?

d : 'int' ID ';' ;

The answer is not just concatenating the tokens because of whitespace. I tried a simple mechanism that added a little bit of code to each token and rule reference. The code snippet would copy the token object into some default template, which the user can specify by overriding a method call getDefaultTemplate(String ruleName). The problem is that the output came out as: intx; not int x; or whatever.

The answer seems to be a simple matter of inserting any off channel tokens into the output template before inserting the real token. What happens though when you invoke a rule that invokes another rule. You cannot simply add any whitespace before the starting token of a rule reference because chains of rule invocations will insert the same whitespace multiple times:

a : b ID ;
b : c ;
c : 'int' ;

The template for c would be 'int' plus any of the whitespace to the left of that token. Rule b's template would be again any whitespace before the first token of c, 'int'. This would duplicate the whitespace. I think a simple index into the token stream could track whether a token has been added to the output. So the little snippets of code for a rule reference would find all off channel tokens between the start token of the rule reference and the first real token before it or the index of the last emitted token (to prevent duplicates).

Automatic StringTemplate construction in ANTLR grammars

ST construction in tree grammars