How to build an ANTLR code generation target

The following instructions show how to go about starting a new back-end for a new language "XYZ" using the Java back-end as a basis.

All the functionality contained in the Java files in the antlr-3.1/runtime/Java/src/org/antlr/runtime directory is needed to create a full back-end. You must also copy from the code generation templates of another target, such as antlr-3.1/src/org/antlr/codegen/templates/Java/*.stg. Finally create antlr-3.1/src/org/antlr/codegen/XYZTarget.java if you need to override anything in Target.java, which you probably will.

But it is in fact pretty easy to get started:

In src/org/antlr/codegen/templates/
- create a directory XYZ
- copy Java/Java.stg to XYZ/XYZ.stg

I recommend building the ANTLR tool 'in place'. Do not create a jar or compile/copy to a build directory. When you run it with 'java -cp path-to-src-dir ...' it will use the original *.stg file, which you'll edit a lot - so rebuilding the tool would be quite a PITA.

Create a directory antlr-3.1/runtime/XYZ. Here you can put anything you need (no need to clone Java 1:1).

Start with a simple lexer like:

lexer grammar T;
options { language = XYZ; }
ZERO: '0';

N.B: The name after the word "grammar" needs to be the same as the filename in which you save it, i.e. T.g.

Look at the generated code and try to figure out which templates in XYZ.stg you have to port to get valid XYZ code. What I did, is to comment out the Java code in all templates replacing it with something like FIXME([number]). Then you fix the templates until no FIXME remains in the
output.

You'll need a basic implementation of a character stream and base recognizer/lexer to get the example running. Just implement the methods that are actually needed to get the example running w/o errors.

You'll either get the feeling "Wow, that was easy!" and move on (that happened to me) or "Eeek, what a pain!" and let someone else to the work.

Target Source Files

Your ANTLR code generation target will consist of several files in the target language. A number of run-time support files that you'll create directly, and a number of files that ANTLR will generate (in the target language) as the result of the user's grammar being processed by ANTLR. These files are generated from StringTemplates like ZYX.stg above.

ANTLR looks for the presence of certain template files and templates. Some are required, some are optional.

XYZ.stg

The principle template file. The templates in this file are used to generate the Lexer, Parser and Tree Parser generated by the user's grammar.

Template Name	Purpose	Notes
outputFile	Generates the target-language implementation of the recognizer.	Required
headerFile	Generates the target-language header file for the recognizer.	Optional

`block()` StringTemplate

Parameter Name	Description
alts
decls
decision
enclosingBlockLevel
blockLevel
decisionNumber
maxK
maxAlt
description

`closureBlock()` StringTemplate

Parameter Name	Description
alts
decls
decision
enclosingBlockLevel
blockLevel
decisionNumber
maxK
maxAlt
description

`outputFile()` StringTemplate

Formal parameters:

Parameter Name	Description
LEXER	Boolean indicating that a Lexer is being generated.
PARSER	Boolean indicating that a Parser or Combined Lexer/Parser is being generated.
TREE_PARSER	Boolean indicating that a Tree Parser is being generated.
actionScope
actions	A `java.util.Map` of the grammar's actions.
docComment
recognizer	The StringTemplate named "lexer", "parser", or "treeParser", depending on the type of recognizer being generated.
name
tokens
tokenNames
rules
cyclicDFAs	A `org.antlr.analysis.DFA` instance.
bitsets
buildTemplate	Boolean
buildAST	Boolean
rewriteMode	Boolean
profile	Boolean
backtracking	Boolean
synpreds	A `java.util.Set` of synpreds in the grammar (if any).
memoize	Boolean
numRules
fileName
ANTLRVersion	String containing the version of the ANTLR tool generating this recognizer.
generatedTimestamp	String containing the current time.
trace	Boolean
scopes
superClass
literals

`rule()` StringTemplate

The rule() StringTemplate is instantiated by ANTLR's own grammar processing and added to the "rules" attribute. It takes the following parameters.

Parameter Name	Description
ruleName	The name of the rule as specified in the input grammar.
ruleDescriptor	The `org.antlr.tool.Rule` object instance associated with this StringTemplate.
block
emptyRule
description
exceptions
finally
memoize

`dfaState()` StringTemplate

Parameter Name	Description
k
edges
eotPredictsAlt
description
stateNumber
semPredState

`dfaLoopbackState()` StringTemplate

A DFA state that is actually the loopback decision of a closure loop. If end-of-token (EOT) predicts any of the targets then it should act like a default clause (i.e., no error can be generated). This is used only in the lexer so that for ('a')* on the end of a rule anything other than 'a' predicts exiting.

Parameter Name	Description
k
edges
eotPredictsAlt
description
stateNumber
semPredState

ANTLR 3

How to build an ANTLR code generation target

Target Source Files

XYZ.stg

`block()` StringTemplate

`closureBlock()` StringTemplate

`outputFile()` StringTemplate

`rule()` StringTemplate

`dfaState()` StringTemplate

`dfaLoopbackState()` StringTemplate

AST.stg

ASTParser.stg

ST.stg

Related content

How to build an ANTLR code generation target

Target Source Files

XYZ.stg

block() StringTemplate

closureBlock() StringTemplate

outputFile() StringTemplate

rule() StringTemplate

dfaState() StringTemplate

dfaLoopbackState() StringTemplate

AST.stg

ASTParser.stg

ST.stg

Related content

`block()` StringTemplate

`closureBlock()` StringTemplate

`outputFile()` StringTemplate

`rule()` StringTemplate

`dfaState()` StringTemplate

`dfaLoopbackState()` StringTemplate