ANTLR v3 printable documentation
An inline version of ANTLR v3 documentation.
ANTLR3 Code Generation Targets
Code generation for the following target languages is currently in development, testing or is complete. Visit the page for each target language for more information - hopefully the persons dealing with each target language will update their respective rows in this table with their current status.
See also Target API documentation and How to build an ANTLR code generation target.
Language | Irresponsible Person | Status |
---|---|---|
Luke A. Guest | Currently dormant. | |
George Scott (initial port, not actively maintaining) | In sync up to 3.2, but currently not in active development. | |
Jim Idle | In sync with ANTLR3 development. Use the .tgz files under the | |
Gokulakannan Somasundaram (was Jim Idle & Ric Klaren) | Created on antlr-3.4 and hence in sync with only antlr-3.4 | |
Maintainer: Johannes Luber | In sync with ANTLR3 Development to 3.3, but a few errors make it beta for 3.3. There are separate targets for .NET 1.1 and .NET 2. | |
Maintainer: Sam Harwell | (Added post-release 3.1.3) In sync with ANTLR3 Development, except no support for the | |
? | ? | |
Ola Bini | He's working on this at the moment; http://github.com/olabini/antlr-elisp | |
Alan Condit, Kay Roepke | Current with 3.3 version. | |
Terence (parrt at cs usfca edu) | In sync with ANTLR3 Development | |
Joey Hurst | In sync with ANTLR3 Development | |
Benjamin Niemann | Current with 3.1.3 | |
Kyle Yetter, previously Martin Traverso | Current with 3.3 | |
Bernhard Schmalhofer Bernhard.Schmalhofer@gmx.de | Inactive. No code produced yet. Takers wanted. | |
Ron Blaschke ron at rblasch.org | Early prototyping. Simple lexer is working. | |
Sidharth Kuruvila, Yauhen Yakimovich, Geoff Speicher, Rolland Brunec | Primary milstone is aimed at verification of Lexer, Parser generation. The work towards implementation of StringTemplate is in progress | |
Dominik Holenstein | Planning and analyzing. First version expected for | |
Matthew Lloyd |
|
Command line options
Usage:
java org.antlr.Tool [args] file.g [file2.g file3.g ...]
Option |
description |
---|---|
-o outputDir |
specify output directory where all output is generated; search for token vocabularies in here also |
-fo outputDir |
same as -o but force even files with relative paths to dir |
-depend |
generate file dependencies; don't actually run antlr |
-lib dir |
specify location of token files and important grammars |
-report |
print out a report about the grammar(s) processed |
print out the grammar without actions |
|
-trace |
generate a parser with trace output - if the default output is not enough, you can override the traceIn and traceOut methods |
-debug |
generate a parser that emits debugging events |
-profile |
generate a parser that computes profiling information |
-nfa |
generate an NFA for each rule |
-dfa |
generate a DFA for each decision point |
-message-format name |
specify output style for messages |
-X |
display extended option list |
There are a bunch of less often used "extended" options as well.
Extended option |
description |
---|---|
-Xgrtree |
print the grammar AST |
-Xdfa |
print DFA as text |
-Xnoprune |
do not test EBNF block exit branches |
-Xnocollapse |
collapse incident edges into DFA states |
-Xdbgconversion |
dump lots of info during NFA conversion |
-Xmultithreaded |
run the analysis in 2 threads |
-Xnomergestopstates |
do not merge stop states |
-Xdfaverbose |
generate DFA states in DOT with NFA configs |
-Xwatchconversion |
print a message for each NFA before converting |
-XdbgST |
put tags at start/stop of all templates in output |
-Xm m |
max number of rule invocations during conversion |
-Xmaxdfaedges m |
max "comfortable" number of edges for single DFA state |
-Xconversiontimeout t |
set NFA conversion timeout for each decision |
-Xmaxinlinedfastates m |
max DFA states before table used rather than inlining |
-Xnfastates |
for nondeterminisms, list NFA states for each path |
Attribute and Dynamic Scopes
Token attributes
attribute |
description |
---|---|
text |
|
type |
|
line |
|
index |
|
pos |
|
channel |
|
tree |
|
int |
|
Rule attributes
Parsers
attribute |
description |
---|---|
text |
|
start |
|
stop |
|
tree |
|
st |
|
Tree parsers
attribute |
description |
---|---|
text |
|
start |
|
tree |
|
st |
|
Lexers
attribute |
description |
---|---|
text |
|
type |
|
line |
|
index |
|
pos |
|
channel |
|
start |
|
stop |
|
int |
|
The Rule text
Attribute in Tree Grammars
In a parser grammar, the relationship between the elements matched by a rule and the associated input text is very clear. A rule begins parsing at a particular token and stops parsing at a particular token. The text attribute for a rule, $text
, is simply the concatenated text from all tokens in that range, including hidden channel tokens. What does $text
mean in a tree grammar, though?
Tree grammar rules match nodes and trees not tokens. Fortunately, each node has an associated token start and stop index (See TreeAdaptor). As the parser builds trees, each rule sets the token indexes for its return AST to the start and stop token of that rule. We can then define the text
attribute for a tree grammar rule to be the text concatenated from the range of tokens indicated by the range in the root of the first tree matched by the rule. This rule may seem strange, but is the most efficient implementation and works in almost all situations. Here are a few examples:
/** match tree created from, e.g., "int x;" * $text would, therefore, be "int x;" * $start node is VAR node. */ variable : ^(VAR type ID) // $text derived from indexes in VAR node ; /** match tree node created from, e.g., "int" * $text would, therefore, be "int" * $start node is 'int' node. */ type: 'int' // $text derived from indexes in 'int' node | 'void' ;
The following code embodies the text
attribute definition. The token range from a rule's start
node defines the range of text for the entire rule.
// input is a TreeNodeStream implementation int start = input.getTreeAdaptor().getTokenStartIndex($start); int stop = input.getTreeAdaptor().getTokenStopIndex($start); String text = input.getTokenStream().toString(start, stop);
Be careful when referencing the text of a rule that happens to be the root of a tree. The text of a rule is the text of all tokens underneath the first rout matched by the rule. In the following example, rule @r op matches a single node, but $op.text
will include the text associated with the two operands as well. The parser that build the plus and multiply operator nodes will set the token range to include all tokens for that expression.
/** match subtrees for + and * created from input such as "1+4*2" * $text and $op.text is "1+4*2" for first alternative. * $text is just the INT node for second alternative. */ expr: ^(op expr expr) ; // $op.text is same as $text! | INT ; op : o='+' | o='*' // $text includes text of operands ; // $o.text is just node's text
Note that the text for a node label is always just the string returned from getText() invoked on that node whereas the text for a rule reference is always the text for the tree rooted at that labeled node.
Finally, here is the case where the definition of the text attribute does not do what you expect. The text attribute is derived from the first node matched by a rule, but a rule such as rule slist that matches multiple subtrees has an ill-defined text attribute because it only gives you the text for the first statement subtree:
func: 'void' ID '()' slist ; // $slist.text is text from first tree only slist: stat+ ;
In general, you just need to keep this in mind--the text attribute is natural in most cases.
Rule scopes
Global shared scopes
Lexical filters
ANTLR has a lexical filter mode that lets you sift through an input file looking for certain grammatical structures. The rules are prioritized in the order specified in case an input construct matches more than a single rule, with the first rule having the highest priority. The filter proceeds character-by-character looking for a match among the rules. If no match, consume that char and try again. The following example, prints found var foo
for every field foo
in the input:
lexer grammar FuzzyJava; options {filter=true;} FIELD : TYPE WS name=ID '[]'? WS? (';'|'=') {System.out.println("found var "+$name.text);} ; fragment TYPE : ID ('.' ID)* ; fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')* ; WS : (' '|'\t'|'\n')+ ;
Don't forget that you must ignore text in comments, so add another rule:
COMMENT : '/*' (options {greedy=false;} : . )* '*/' {System.out.println("found comment "+getText());} ;
Grammars
Grammar syntax
All grammars are of the form:
/** This is a grammar doc comment */
grammar-type grammar
name;
options { name1 =
value; name2 =
value2; ... }
import delegateName1=grammar1, ..., delegateNameN=grammarN; // can omit delegateName
tokens { token-name1; token-name2 =
value; ... }
scope global-scope-name-1 { «attribute-definitions» }
scope global-scope-name-2 { «attribute-definitions» }
...
@header { ... }
@lexer::header { ... }
@members { ... }
«rules»
The type of the grammar, specified via the grammar-type modifier above, can be one of: lexer
, parser
, tree
, and combined (no modifier). To set the superclass of the generated parser class, use the superClass
option. See Grammar options for a list of valid grammar options and their semantics.
Rule syntax
/** rule comment */
access-modifier rule-name[«arguments»] returns [«return-values»] throws name1, name2, ...
options {...}
scope {...}
scope global-scope-name, ..., global-scope-nameN;
@init {...}
@after {...}
: «alternative-1» -> «rewrite-rule-1»
| «alternative-2» -> «rewrite-rule-2»
...
| «alternative-n» -> «rewrite-rule-n»
;
catch [«exception-arg-1»] {...}
catch [«exception-arg-2»] {...}
finally {...}
See Rule and subrule options for a list of valid rule options and their semantics.
Lexer, Parser and Tree Parser rules
Rules in a grammar are special cases of identifier names. Lexer rules must start with an upper case letter, parser and tree parser rules must start with a lower case letter.
LexerRuleName : ('A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; ParserRuleName : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; TreeParserRuleName : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
Here are some common lexical rules for programming languages:
WS : (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;} ; COMMENT : '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;} ; LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} ;
The $channel=HIDDEN;
action places those tokens on a hidden channel. They are still sent to the parser, but the parser does not see them. Actions, however, can ask for the hidden channel tokens. If you want to literally throw out tokens then use action skip();
(see org.antlr.runtime.Lexer.skip()).
Sometimes you will need some help or rules to make your lexer grammar more readable. Use the fragment
modifier in front of the rule:
HexLiteral : '0' ('x'|'X') HexDigit+ ; fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
In this case, HexDigit
is not a token in its own right; it can only be called from HexLiteral
.
Warning: T__ is considered a reserved token name and token name prefix. please don't use it as one of your rule names.
Tree grammar rules
Rules in tree grammars are identical to parser grammars except that they can specify a tree element to match. The syntax is ^( root child1 child2 ... childn )
. For example:
decl : ^(DECL type declarator) {System.out.println($type.text+" "+$declarator.text);} ;
Attribute scope syntax
Attribute scopes are a set of attribute definitions of the form:
scope
name {
type1 attribute-name1;
type2 attribute-name2;
}
Grammar action syntax
Grammar actions are, in general, of the form:
@
action-name { ... }
@
scope-name::
action-name { ... }
The default scope-name is parser
. For instance, @header
is the same as @parser::header
. Valid scope-name 's differ depending on the target, but most targets should support parser
and lexer
. Two common action-name 's are header
and members
. The header
action is placed at the top of a generated class definition and the members
action is inserted within the body of a generated class definition. For example, the following grammar actions would ensure generated parser and lexer Java classes include a package declaration:
@parser::header { package my.example.package; } @lexer::header { package my.example.package; }
Rule elements
Rules may reference:
Element | Description |
---|---|
T | Token reference. An uppercase identifier; lexer grammars may use optional arguments for fragment token rules. |
T<node=V> or T<V> | Token reference with the optional token option |
T[«args»] | Lexer rule (token rule) reference. Lexer grammars may use optional arguments for fragment token rules. |
r [«args»] | Rule reference. A lowercase identifier with optional arguments. |
'«one-or-more-char»' | String or char literal in single quotes. In parser, a token reference; in lexer, match that string. |
{«action»} | An action written in target language. Executed right after previous element and right before next element. |
{«action»}? | Semantic predicate. |
{«action»}?=> | Gated semantic predicate. |
(«subrule»)=> | Syntactic predicate. |
(«x»|«y»|«z») | Subrule. Like a call to a rule with no name. |
(«x»|«y»|«z»)? | Optional subrule. |
(«x»|«y»|«z»)* | Zero-or-more subrule. |
(«x»|«y»|«z»)+ | One-or-more subrule. |
«x»? | Optional element. |
«x»* | Zero-or-more element. |
«x»+ | One-or-more element. |
Grammar options
Taken from /org/antlr/tool/Grammar.java all allowed options are
Option |
Description |
---|---|
|
The target language for code generation. Default is |
|
Where ANTLR should get predefined tokens and token types. Tree grammars need it to get the token types from the parser that creates its trees. TODO: Default value? Example? |
|
The type of output the generated parser should return. Valid values are |
|
Set the type of all tree labels and tree-valued expressions. Without this option, trees are of type |
|
Set the type of all token-valued expressions. Without this option, tokens are of type |
|
Set the superclass of the generated recognizer. TODO: Default value ( |
|
In the lexer, this allows you to try a list of lexer rules in order. The first one that matches, wins. This is the token that |
|
Valid values are |
|
Limit the lookahead depth for the recognizer to at most |
|
Valid values are |
|
Valid values are |
Rule and subrule options
option |
description |
---|---|
|
Specify the exact lookahead to be used by the subrule. |
|
Valid values are |
|
Rule-specific version of the |
|
Rule-specific version of the |
Special symbols in actions
This table describes the complete set of special symbols you can use in actions within your grammar. These are translated by the codegen/action.g ANTLR v3 grammar (in filter mode). The rules mentioned below are found in action.g
Syntax |
Description |
---|---|
$enclosingRule.attr |
x is enclosing rule, y is a return value, parameter, or predefined property. Rule ENCLOSING_RULE_SCOPE_ATTR. r[int i] returns [int j] : {$r.i, $r.j, $r.start, $r.stop, $r.st, $r.tree} ; |
$tokenLabel.prop |
token scope attribute. Rule TOKEN_SCOPE_ATTR. |
$rulelabel.attr |
Rule RULE_SCOPE_ATTR. |
$label |
either a token label or token/rule list label like label+=expr. Rule LABEL_REF. |
$tokenref |
in a non-lexer grammar ISOLATED_TOKEN_REF |
$lexerruleref |
Yields a Token object created from that rule or fragment rule. Rule ISOLATED_LEXER_RULE_REF. |
$y |
return value, parameter, predefined rule property, or token/rule r[int i] returns [int j] : {$i, $j, $start, $stop, $st, $tree} ; |
$x::y |
the only way to access the attributes within a dynamic scope scope Symbols { List names; } r scope {int i;} scope Symbols; : {$r::i=3;} s {$Symbols::names;} ; s : {$r::i; $Symbols::names;} ; |
$x[-1]::y |
previous (just under top of stack). Rule DYNAMIC_NEGATIVE_INDEXED_SCOPE_ATTR. |
$x[-i]::y |
top of stack - i where the '-' MUST BE PRESENT; |
$x[i]::y |
absolute index i (0..size-1). Rule DYNAMIC_NEGATIVE_INDEXED_SCOPE_ATTR. |
$x[0]::y |
is the absolute 0 indexed element (bottom of the stack). Rule DYNAMIC_NEGATIVE_INDEXED_SCOPE_ATTR. |
$x.size() |
returns the size of the current stack of the scope. Note: This particular syntax is target-dependent. Look at the target page for other targets than Java. |
$r |
r is a rule's dynamic scope or a global shared scope. |
The following symbols relate to StringTemplate templates.
Syntax |
Description |
---|---|
%foo(a={},b={},...) |
Create instance of template foo, setting attribute arguments. Rule TEMPLATE_INSTANCE. |
%({name-expr})(a={},...) |
indirect template ctor reference. Rule INDIRECT_TEMPLATE_INSTANCE. |
%x.y = z; |
set template attribute y of x (always set never get attr) |
%{expr}.y = z; |
template attribute y of StringTemplate-typed expr to z. Rule SET_EXPR_ATTRIBUTE. |
%{string-expr} |
anonymous template from string expr. Rule TEMPLATE_EXPR. |
Template construction
ANTLR v3 has built-in support for constructing StringTemplate templates. there are two forms: Special symbols in actions and rewrite rules similar to AST construction. I am including a number of rules from the mantra example.
Sometimes you just need a string to become a template:
'void' -> {%{"void"}}
The following tree grammar rule illustrates some of the basic rewrite rules:
primary : ID -> {%{$ID.text}} // create template from token text // create template using rule results as template attributes | ^('new' typename args=expressionList) -> new(type={$typename.st},args={$args.st}) | listliteral -> {$listliteral.st} // reuse template built for listliteral // create template using token text as template attribute | NUM_INT -> int_literal(v={$NUM_INT.text}) ;
And here are some more complicated examples:
assignment : // special case "a[i] = expr;" ^('=' ^(EXPR ^(INDEX a=expression i=expression)) rhs=completeExpression) -> indexed_assignment(list={$a.st}, index={$i.st}, rhs={$rhs.st}) | ^('=' lvalue completeExpression) -> assignment( lhs={$lvalue.st}, rhs={$completeExpression.st}) | ^(assign_op lvalue completeExpression) -> assignment_with_op( type={$assign_op.start.type.name}, op={$assign_op.text}, lhs={$lvalue.st}, rhs={$completeExpression.st}) ;
When you need to append multiple strings or templates into a another template use the += operator for a rule's return value (former use of toTemplates no longer required). For example adding variable declarations inside a struct template :
structDeclaration : name=Ident (decls+=typeDecls)+ -> structDecl(name={$name.text},declList={$decls}); ;
and the templates for a struct declaration may be something like this :
structDecl(name,declList) ::= << struct <name> { <declList; separator="\n"> }
More on string templates can be found here: String Template
Tree construction
There are two mechanisms in v3 for building abstract syntax trees (ASTs): operators and rewrite rules.
Operators
Nodes created for unmodified tokens and trees for unmodified rule references are added to the current subtree as children.
Operator |
Description |
---|---|
! |
do not include node or subtree (if referencing a rule) in subtree |
^ |
make node root of subtree created for entire enclosing rule even if nested in a subrule |
additiveExpression : multiplicativeExpression ('+'^ multiplicativeExpression)* ;
That is the same as the following in rewrite notation from the following section:
additiveExpression : (a=multiplicativeExpression->$a) // set result ( '+' b=multiplicativeExpression -> ^('+' $additiveExpression $b) // use previous rule result )* ;
Rewrite rules
The rewrite syntax is more powerful than the operators. It suffices for most common tree transformations.
While the parser grammar specifies how to recognize input, the rewrites are generational grammars, specifying how to generate output. ANTLR figures out how to map input to output grammar. To create an imaginary node, just mention it like the following example (UNIT
is a node created from an imaginary token and is used to group the compilation unit chunks):
compilationUnit : packageDefinition? importDefinition* typeDefinition+ -> ^(UNIT packageDefinition? importDefinition* typeDefinition*) ;
ANTLR tracks all elements with the same name into a single implicit list:
formalArgs : formalArg (',' formalArg)* -> formalArg+ | ;
If the same rule or token is mentioned twice you generally must label the elements to distinguish them. If you want to combine multiple elements into a single list, list labels are very handy (though in this case since they have the same name ANTLR will automatically combine them):
('implements' i+=typename (',' i+=typename)*)?
Here is the entire rule:
classDefinition[MantraAST mod] : 'class' cname=ID ('extends' sup=typename)? ('implements' i+=typename (',' i+=typename)*)? '{' ( variableDefinition | methodDefinition | ctorDefinition )* '}' -> ^('class' ID {$mod} ^('extends' $sup)? ^('implements' $i+)? variableDefinition* ctorDefinition* methodDefinition* ) ;
Note that using a simple action in a rewrite means evaluate the expression and use as a tree node or subtree. The mod
argument is a set of modifiers passed in from an enclosing rule.
Deleting tokens or rules is easy: just don't mention them:
packageDefinition : 'package' classname ';' -> ^('package' classname) ;
If you need to build different trees based upon semantic information, use a semantic predicate:
variableDefinition : modifiers typename ID ('=' completeExpression)? ';' -> {inMethod}? ^(VARIABLE ID modifiers? typename completeExpression?) -> ^(FIELD ID modifiers? typename completeExpression?) ;
where inMethod
is set by the method
rule.
Often you will need to build a tree node from an input token but with the token type changed:
compoundStatement : lc='{' statement* '}' -> ^(SLIST[$lc] statement*) ;
SLIST
by itself is a new node based upon token type SLIST
but it has no line/column information nor text. By using SLIST[$lc]
, all information except the token type is copied to the new node.
Using a rewrite rule at a non-extreme-right-edge-of-production location is ok, but it still always sets the overall subtree for the enclosing rule.
'if' '(' equalityExpression ')' s1=statement ( 'else' s2=statement -> ^('if' ^(EXPR equalityExpression) $s1 $s2) | -> ^('if' ^(EXPR equalityExpression) $s1) )
You may reference the previous subtree for the enclosing rule using $rulename
syntax
postfixExpression : (primary->primary) // set return tree ( lp='(' args=expressionList ')' -> ^(CALL $postfixExpression $args) | lb='[' ie=expression ']' -> ^(INDEX $postfixExpression $ie) | dot='.' p=primary -> ^(FIELDACCESS $postfixExpression $p) | c=':' cl=closure[false] -> ^(APPLY ^(EXPR $postfixExpression) $cl) )* ;
Imaginary nodes
References to tokens with rewrite not found on left of -> are imaginary tokens.
d : type ID ';' -> ^(DECL type ID) ; // DECL is imaginary
or
call : lp='(' ID args ')' -> ^(CALL[$lp] ID args) ;
Here, the CALL node has its line/column info set with info from '(' token. CALL node is "derived" from the '('.
Even tokens referenced within alternative result in nodes disassociated with tokens from left of -> if you put arguments on the references:
a : INT -> INT["99"] ; // node created from adaptor.create(INT, "99")
Tree construction during tree parsing
ANTLR 3.0.1 could not create trees during tree parsing. 3.1 introduces the ability to create a new AST from an incoming AST using rewrites rules:
- Each rule returns a new tree.
- An alternative without a rewrite duplicates the incoming tree.
- The tree returned from the start rule is the new tree.
- The new tree created with output=AST in a tree grammar is completely independent of the input tree as all nodes are duplicated (with and without rewrite -> operator).
The rewrites work just like they do for normal parsing:
a : INT ; // duplicate INT node and return
a : ID -> ; // delete ID node from tree
a : INT ID -> ID INT ; // reorder nodes
a : ^(ID INT) -> ^(INT ID) ; // flip order of nodes in tree
a : INT -> INT["99"] + // make new INT node
a : (^(ID INT))+ -> INT+ ID+ ; // break apart trees into sequences
Predicates can be used to choose between rewrites as well:
a : ^(ID INT) -> {some test}? ^(ID["ick"] INT) -> INT ;
Don't forget the wildcard
s : ^(ID c=.) -> $c ; // new tree is whatever matched wildcard
Polynomial differentiation example
For translations whose input and output languages are the same, it often makes sense to build a tree and them morph it towards the final output tree, which can then be converted to text. Polynomial differentiation is a great example of this. Recall that:
- d/dx(n) = 0
- d/dx(x) = 1
- d/dx(nx) = n
- d/dx(nx^m) = nmx^m-1
- d/dx(foo + bar) = d/dx(foo) + d/dx(bar)
Ok, here's a parser that builds nice trees.
grammar Poly; options {output=AST;} tokens { MULT; } // imaginary token poly: term ('+'^ term)* ; term: INT ID -> ^(MULT["*"] INT ID) | INT exp -> ^(MULT["*"] INT exp) | exp | INT | ID ; exp : ID '^'^ INT ; ID : 'a'..'z'+ ; INT : '0'..'9'+ ; WS : (' '|'\t'|'\r'|'\n')+ {skip();} ;
Then we differentiate:
tree grammar PolyDifferentiator; options { tokenVocab=Poly; ASTLabelType=CommonTree; output=AST; // rewrite=true; // works either in rewrite or normal mode } poly: ^('+' poly poly) | ^(MULT INT ID) -> INT | ^(MULT c=INT ^('^' ID e=INT)) { String c2 = String.valueOf($c.int*$e.int); String e2 = String.valueOf($e.int-1); } -> ^(MULT["*"] INT[c2] ^('^' ID INT[e2])) | ^('^' ID e=INT) { String c2 = String.valueOf($e.int); String e2 = String.valueOf($e.int-1); } -> ^(MULT["*"] INT[c2] ^('^' ID INT[e2])) | INT -> INT["0"] | ID -> INT["1"] ;
then we simplify (a little anyway):
tree grammar Simplifier; options { tokenVocab=Poly; ASTLabelType=CommonTree; output=AST; backtrack=true; // rewrite=true; // works either in rewrite or normal mode } /** Match some common patterns that we can reduce via identity * definitions. Since this is only run once, it will not be perfect. * We'd need to run the tree into this until nothing * changed to make it correct. */ poly: ^('+' a=INT b=INT) -> INT[String.valueOf($a.int+$b.int)] | ^('+' ^('+' a=INT p=poly) b=INT) -> ^('+' $p INT[String.valueOf($a.int+$b.int)]) | ^('+' ^('+' p=poly a=INT) b=INT) -> ^('+' $p INT[String.valueOf($a.int+$b.int)]) | ^('+' p=poly q=poly)-> {$p.tree.toStringTree().equals("0")}? $q -> {$q.tree.toStringTree().equals("0")}? $p -> ^('+' $p $q) | ^(MULT INT poly) -> {$INT.int==1}? poly -> ^(MULT INT poly) | ^('^' ID e=INT) -> {$e.int==1}? ID -> {$e.int==0}? INT["1"] -> ^('^' ID INT) | INT | ID ;
Finally we walk the tree to print it back out using simple templates:
tree grammar PolyPrinter; options { tokenVocab=Poly; ASTLabelType=CommonTree; output=template; } poly: ^('+' a=poly b=poly) -> template(a={$a.st},b={$b.st}) "<a>+<b>" | ^(MULT a=poly b=poly) -> template(a={$a.st},b={$b.st}) "<a><b>" | ^('^' a=poly b=poly) -> template(a={$a.st},b={$b.st}) "<a>^<b>" | INT -> {%{$INT.text}} | ID -> {%{$ID.text}} ;
Here is a test rig:
import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class Main { public static void main(String[] args) throws Exception { CharStream input = null; if ( args.length>0 ) { input = new ANTLRFileStream(args[0]); } else { input = new ANTLRInputStream(System.in); } // BUILD AST PolyLexer lex = new PolyLexer(input); CommonTokenStream tokens = new CommonTokenStream(lex); PolyParser parser = new PolyParser(tokens); PolyParser.poly_return r = parser.poly(); System.out.println("tree="+((Tree)r.tree).toStringTree()); // DIFFERENTIATE CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)r.tree); nodes.setTokenStream(tokens); PolyDifferentiator differ = new PolyDifferentiator(nodes); PolyDifferentiator.poly_return r2 = differ.poly(); System.out.println("d/dx="+((Tree)r2.tree).toStringTree()); // SIMPLIFY / NORMALIZE nodes = new CommonTreeNodeStream((Tree)r2.tree); nodes.setTokenStream(tokens); Simplifier reducer = new Simplifier(nodes); Simplifier.poly_return r3 = reducer.poly(); System.out.println("simplified="+((Tree)r3.tree).toStringTree()); // CONVERT BACK TO POLYNOMIAL nodes = new CommonTreeNodeStream((Tree)r3.tree); nodes.setTokenStream(tokens); PolyPrinter printer = new PolyPrinter(nodes); PolyPrinter.poly_return r4 = printer.poly(); System.out.println(r4.st.toString()); } }
Running the rig on "2x+3x^5" shows:
tree=(+ (* 2 x) (* 3 (^ x 5))) d/dx=(+ 2 (* 15 (^ x 4))) simplified=(+ 2 (* 15 (^ x 4))) 2+15x^4
Rewriting an existing AST
For efficiency, option rewrite=true does an in-line replacement for rewrite rules so you can avoid making a copy of an entire tree just to tweak a few nodes. For example, if you have a huge expression tree but only want to rewrite ^('+' INT INT) to be a single INT node, it's better not to duplicate the entire huge tree. The rewrite mode behaves exactly the same as nonrewrite mode except that rewrites stitch changes into the incoming tree. Nodes are not duplicated for rules w/o rewrites.
The result of a rule with a rewrite is the newly created tree. The result of a rule w/o a rewrite is simply the incoming tree. For chains of rule invocations as in the next example, ANTLR copies rewrites upwards so that the action in rule 's' prints out the tree created in b:
tree grammar TP; options {output=AST; ASTLabelType=CommonTree; tokenVocab=T; rewrite=true;} s : a {System.out.println($a.tree.toStringTree());} ; a : b ; b : ID INT -> INT ID ;
Heterogeneous tree nodes
By default, with output=AST, ANTLR creates trees of type CommonTree
. To create different nodes depending on the incoming token type, you can override create(Token)
and YourTreeClass.dupNode(Object)
and errorNode()
in a subclass of CommonTreeAdaptor or implement your own TreeAdaptor. Unfortunately, this only allows you to change the node type based upon the token type and not grammatical context. Sometimes you want to have ID become a VarNode and sometimes they MethodNode object. As of v3.1, you can use the node
token option to indicate node type (in both parsers and tree parsers):
decl : 'int'<node=TypeNode> ID<node=VarNode> ';' ;
or equivalently
decl : 'int'<TypeNode> ID<VarNode> ';' ;
because node
is assumed if there is only one option and it is not an option assignment. Token references with node options and invoke the following constructor during tree construction:
public V(Token t); // NEED SPECIAL CTOR for ID<V> on left of ->
You can specify the node type on any token reference including liberals:
a : ID<V> ';'<V>
The "become root" operator ^ is used following the token options:
e : INT '+'<PlusNode>^ INT ;
Labels are available as usual; e.g., x=ID<V>
and x+=ID<V>
.
Heterogeneous tree nodes are labeled on the right-hand side of the -> rewrite operator as well:
decl : 'int' ID -> ^('int'<TypeNode> ID<VarNode>) ;
You can also specify arguments on node type constructors on the right of -> rewrite operator. For example, the following two token references:
ID<V>[42,19,30] ID<V>[$ID,99]
invoke the following two constructors of V
:
public V(int ttype, int x, int y, int z) public V(int ttype, Token t, int x)
The TreeAdaptor is not called; instead for constructors are invoked directly. This is much more flexible because the list of arguments can change per type whereas the TreeAdaptor interface is fixed. Note that parameters are not allowed on token references to the left of ->:
a : ID<V>[23,21] ; // ILLEGAL
Use imaginary nodes as you normally would, but with the addition of the node type:
block : lc='{' stat+ '}' -> ^(BLOCK<StatementList>[$lc] stat+) ;
Here is a complete simple example:
grammar T; options {output=AST;} @members { static class V extends CommonTree { public int x,y,z; public V(int ttype, int x, int y, int z) { this.x=x; this.y=y; this.z=z; token=new CommonToken(ttype,""); } public V(int ttype, Token t, int x) { token=t; this.x=x; } public String toString() { return (token!=null?token.getText():"")+"<V>;"+x+y+z; } } } a : ID -> ID<V>[42,19,30] ID<V>[$ID,99] ; ID : 'a'..'z'+ ; WS : (' '|'\\n') {$channel=HIDDEN;} ;
Sometimes ANTLR must duplicate nodes to avoid cycles and to provide useful semantics. In the next example, the trees returned from rule type
are type V
. There is only one type specification in the input (e.g., "int a,b,c;") but multiple identifiers. to create multiple trees with 'int' at the root, that node must be duplicated by the rewrite rule ^(type ID)+
.
grammar T; options {output=AST;} a : type ID (',' ID)* ';' -> ^(type ID)+; type : 'int'<V> ; ID : 'a'..'z'+ ; INT : '0'..'9'+; WS : (' '|'\\n') {$channel=HIDDEN;} ;
We want 3 trees, one for each identifier:
(int<V> a) (int<V> b) (int<V> c)
We need to override dupNode() in our node class definition as well as two constructors:
class V extends CommonTree { public V(Token t) { token=t;} // for 'int'<V> public V(V node) { super(node); } // for dupNode public Tree dupNode() { return new V(this); } // for dup'ing type public String toString() { return token.getText()+"<V>";} }
Here is a test rig:
public class Test { public static void main(String[] args) throws Exception { CharStream input = new ANTLRFileStream(args[0]); TLexer lex = new TLexer(input); TokenRewriteStream tokens = new TokenRewriteStream(lex); TParser parser = new TParser(tokens); TParser.a_return r = parser.a(); if ( r.tree!=null ) { System.out.println(((Tree)r.tree).toStringTree()); ((CommonTree)r.tree).sanityCheckParentAndChildIndexes(); } } }
Using custom AST node types
/** An adaptor that tells ANTLR to build CymbalAST nodes */ public static TreeAdaptor cymbalAdaptor = new CommonTreeAdaptor() { public Object create(Token token) { return new CymbalAST(token); } public Object dupNode(Object t) { if ( t==null ) { return null; } return create(((CymbalAST)t).token); } public Object errorNode(TokenStream input, Token start, Token stop, RecognitionException e) { CymbalErrorNode t = new CymbalErrorNode(input, start, stop, e); return t; } };
Here's a suitable error node:
/** A node representing erroneous token range in token stream */ public class CymbolErrorNode extends CymbolAST { org.antlr.runtime.tree.CommonErrorNode delegate; public CymbolErrorNode(TokenStream input, Token start, Token stop, RecognitionException e) { delegate = new CommonErrorNode(input,start,stop,e); } public boolean isNil() { return delegate.isNil(); } public int getType() { return delegate.getType(); } public String getText() { return delegate.getText(); } public String toString() { return delegate.toString(); } }
Error Node Insertion Upon Syntax Error
Prior to v3.1, ANTLR AST-building parsers did not alter the resulting AST upon syntax error. After v3.1 ANTLR adds an error node as created by TreeAdaptor.errorNode(...) to represent the missing nodes or confusing input sequences. The first token in the error sequence is the token at which the parser first detected an error. The last token in the sequence is the last token consumed during error recovery. ANTLR creates a CommonErrorNode by default, but you can obviously create your own tree adapter and override this.
Let me demonstrate the new mechanism by example. Referring to the attached SimpleC.g from the v3 examples, here is some good input:
int foo() { for (i=0; i<3; i=i+1) { x=9; } }
That input results in the following tree output:
tree=(FUNC_DEF (FUNC_HDR int foo) (BLOCK (for (= i 0) (< i 3) (= i (+ i 1)) (BLOCK (= x 9)))))
The grammar and tree construction for the FOR loop is as follows:
forStat : 'for' '(' start=assignStat ';' expr ';' next=assignStat ')' block -> ^('for' $start expr $next block) ;
Now, remove the first '(' of the or loop:
int foo() { for i=0; i<3; i=i+1) { x=9; } }
You will see that ANTLR detects an error, but magically inserts the missing token. In this case, the parser was not asked to insert the '(' into the tree so there is no evidence of the error in the output tree:
line 2:6 missing '(' at 'i' tree=(FUNC_DEF (FUNC_HDR int foo) (BLOCK (for (= i 0) (< i 3) (= i (+ i 1)) (BLOCK (= x 9)))))
What about when you have a random extra token such as "22" before the '(':
int foo() { for 22 (i=0; i<3; i=i+1) { x=9; } }
line 2:6 extraneous input '22' expecting '(' tree=(FUNC_DEF (FUNC_HDR int foo) (BLOCK (for (= i 0) (< i 3) (= i (+ i 1)) (BLOCK (= x 9)))))
Again, ANTLR detects the error and is able to ignore the extraneous token to yield a valid tree.
If you forget a token that must go into the output tree, however, you will see an error node. Given a missing identifier at the start of the FOR loop:
int foo() { for (=0; i<3; i=i+1) { x=9; } }
The parser emits:
line 2:7 missing ID at '=' tree=(FUNC_DEF (FUNC_HDR int foo) (BLOCK (for (= <missing ID> 0) (< i 3) (= i (+ i 1)) (BLOCK (= x 9)))))
The toString()
method of the error node yields "<missing ID>".
When the parser gets really confused, such as when it gets a NoViableAltException, you will see that it consumes a whole bunch of input and adds it to the tree as an error node (it indicates what tokens it consumes during resynchronization). Input:
); int foo() { for (i=0; i<3; i=i+1) { x=9; } }
yields:
line 1:0 required (...)+ loop did not match anything at input ')' tree=<error: ); int foo() { for (i=0; i<3; i=i+1) { x=9; } }>
Making custom error nodes
Just override errorNode() in TreeAdaptor. The default handling is as follows:
public Object errorNode(TokenStream input, Token start, Token stop, RecognitionException e) { CommonErrorNode t = new CommonErrorNode(input, start, stop, e); return t; }
Make sure that your error node type is a subclass of your node type so that you do not get class cast exceptions.
See the next section for example of how to override.
Turning off error node construction
To turn this off, just override errorNode:
class MyAdaptor extends CommonTreeAdaptor { public Object errorNode(TokenStream input, Token start, Token stop, RecognitionException e) { return null; } }
and then set
parser.setTreeAdaptor(new MyAdaptor());
What makes a language problem hard?
Given a source to target mapping. How can you characterize the difficulty of the translation?
- Is the set of all input fixed? If you have a fixed set of files to convert, your job is much easier because the set of language construct combinations is fixed. For example, building a general Pascal to Java translator is much harder than building a translator for a set of 50 existing Pascal files.
- Forward or external references? I.e., multiple passes needed? Pascal has a "forward" reference to handle intra-file procedure references, but references to procedures in other files via the USES clauses etc... require special handling.
- Is input order of sentences close to output order? Are there multiple files to generate from a single input file or vice versa?
- Context sensitive lexer? You can't decide what vocabulay symbol to match unless you know what kind of sentence you are parsing.
- Are delimiters non-fixed for things like strings and comments? That makes it tough to build an efficient lexer.
- Is language big; like lots of statements?
- Are the source statements really similar; declarations vs expressions in C++?
- Column sensitive input? E.g., are newlines significant like lines in a log file and does the position of an item change its meaning?
- Case sensitivity problems like fortran?
- Do you need good error recovery? Good reporting?
- Well defined language or no manual; hacked for ages like gnucc by non-language designers? Is your language VisualBasic-like?
- How fast does your translator have to be? It is often the case that building lots of translator phases simplifies your problem, but it can slow down the translation.
- Does your input have comments as you do in programming languages that can occur anywhere in the input and need to go into the output in a sane location?
- How much semantic information do you need to do the translation? For example, do you need to simply know that something is a type name or do you need to know that it is, say, an array whose indices are a set like (day,week,month) and contains records? Sometimes syntax alone is enough to do translation.
- Equivalent syntaxes? In C there are many different ways to dereference pointers. You can normalize the language to a standard representation, but you might loose the original representation. The choice usually hinges on whether the output will be human-edited or not. Designing the right tree structure has to incorporate decisions like this.
- Jurgen Pfundt points out: The considered language might be small, but mapping is targeted for the conversion of huge files and this is really a challenge. An input file with a size of several megabytes restricts the usage of tree parsers or any other kind of memory consuming features. The transformation should be done in one single pass due to performance requirements and an extremly good and comfortable error reporting and error recovery is a must.
Integration with development environments
Table of Contents
VisualStudio
C# Projects
For C# projects, you can integrate ANTLR with Visual Studio 2005/2008 by pasting the following snippet of XML near the end of your .csproj file. The <Target>-Tag must appear before <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets"/> for the Build to run successfully. Adjust the Included and OutputFiles to fit your project.
<ItemGroup> <Antlr3 Include="SimpleCalc.g"> <OutputFiles>SimpleCalcLexer.cs;SimpleCalcParser.cs</OutputFiles> </Antlr3> <Antlr3 Include="BigCalc.g"> <OutputFiles>BigCalcLexer.cs;BigCalcParser.cs</OutputFiles> </Antlr3> </ItemGroup> <Target Name="GenerateAntlrCode" Inputs="@(Antlr3)" Outputs="%(Antlr3.OutputFiles)"> <Exec Command="java org.antlr.Tool -message-format vs2005 @(Antlr3)" Outputs="%(Antlr3.OutputFiles)"/> </Target> <PropertyGroup> <BuildDependsOn>GenerateAntlrCode;$(BuildDependsOn)</BuildDependsOn> </PropertyGroup>
To finish off the integration, look higher in the same file and find the XML block that contains AssemblyInfo.cs:
<ItemGroup> <Compile Include="Program.cs" /> <Compile Include="Properties\AssemblyInfo.cs" /> </ItemGroup>
and add the additional Compile options below to have your .cs output files added to the project as dependent on your .g files. Again, adjust the input and output names to fit your project.
<ItemGroup> <Compile Include="Program.cs" /> <Compile Include="Properties\AssemblyInfo.cs" /> <Compile Include="SimpleCalcLexer.cs"> <AutoGen>True</AutoGen> <DesignTime>True</DesignTime> <DependentUpon>SimpleCalc.g</DependentUpon> </Compile> <Compile Include="SimpleCalcParser.cs"> <AutoGen>True</AutoGen> <DesignTime>True</DesignTime> <DependentUpon>SimpleCalc.g</DependentUpon> </Compile> <Compile Include="BigCalcParser.cs"> <AutoGen>True</AutoGen> <DesignTime>True</DesignTime> <DependentUpon>BigCalc.g</DependentUpon> </Compile> <Compile Include="BigCalcLexer.cs"> <AutoGen>True</AutoGen> <DesignTime>True</DesignTime> <DependentUpon>BigCalc.g</DependentUpon> </Compile> </ItemGroup>
Finally, do not forget to add InitialTargets attribute into Project node.
<Project DefaultTargets="Build" InitialTargets="GenerateAntlrCode" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
With ANTLR 3.0, there are some minor integration issues. Sometimes error messages have an empty "location" string, so Visual Studio will not detect the error. Also, it is possible for ANTLR to generate the output file successfully even though your grammar has errors in it. When this happens, the grammar will not be recompiled until it is edited again (since the output is more recent than the grammar).
C/C++ .rules Files for Visual Studio
The C runtime distribution - use the 3.1 distribution, which you may need to get from the latest interim build at the time of writing - comes with a set of .rules files, which you can add in to your Visual Studio configuration. When you add a .g file to your Visual Studio project, it will ask you which rule file you wish to use. Note that you must tell Visual Studio whether this is a lexer only, parser only, parser+lexer, or tree grammar (you can change it later if you get this wrong).
The rulefiles are called:
antlr3lexer.rules antlr3lexerandparser.rules antlr3parser.rules antlr3treeparser.rules
Right clicking on a .g file in solution explorer and selecting properties will allow you to configure any of the antlr command line options to suit your needs. The defaults are usually fine, but you may wish to configure the output and lib directories to conform to the usual layouts of directories that Visual Studio assumes, and -Xconversiontimeout is useful for more complicated grammars. The base directory will be the .vcproj directory and so your output may or may not be in this directory (do you store your .g files in the same directory as the .vcproj file basically).
The examples solution (C.sln), which is part of the downloadable examples tar/zip on the main ANTLR downloads page, uses this technique to build the ANTLR grammars, if you are looking for some examples. Look under the C subdirectory for the C target examples.
N.B.
These .rules files only work for the C target.
Eclipse
Eclipse 3.3+ for Antlr 3.x
AntlrDT is a standard Eclipse plugin implementing an Antlr 3.1+ specific grammar editor, outline, and builder. Also includes a StringTemplate group file editor and outline view.
See http://www.certiv.net/projects/plugins/antlrdt.html
ANTLR IDE. An eclipse plugin for ANTLRv3 grammars.
Features
- Support for ANTLR 3.0.x/3.1.x
- Integrated ANTLR/Java Launcher and Debugger(beta). Note: ANTLR breakpoints not supported yet
- ANTLR Built-in Interpreter, Java runner and debugger
- Railroad diagrams
- Custom targets
- Automatically (Ctrl+S)/Manually (Ctrl+Shift+G) code generator
- Problem markers for errors and warnings in grammar files
- Advanced text editor, code selection (F3) and code completion (Ctrl+Space)
- Simple syntax highlighting for target language (action code)
- Outline and quick outline (Ctrl + O) views for options, tokens, scopes, actions and rules
- Search rules references
- Mark generated resources as derived
More information? Please visit http://antlrv3ide.sourceforge.net/
IntelliJ
In version 7:
Navigate to File->Settings->IDE Settings->Plugins and install the "ANTLRWorks" plugin.
NetBeans
A plugin that provides some support for editing ANTLR v3 grammar files is available from the NetBeans Plugin Portal. The author notes that the plugin supports the following while editing: Coloring, Code folding, Code completion, Hyperlink, Mark Occurrences, Navigator.
The plugin does not provide direct support for generating Java source files, nor compiling those files, however NetBeans uses an Ant-based build system, therefore support for these operations can be added to individual projects. This functionality is added by editing the build.xml file for the project (available from the "Files" tab).
Adding Build and Clean Support to an Individual Project
This is one possible method of adding support for ANTLR v3 to an individual project. Those who are familiar with Ant build scripts and the Netbeans build chain can modify this to suit their own needs.
Prerequisites
- NetBeans (this solution was tested on version 6.5)
- Anltr v3 Ant Task
- Ant-contrib Ant Tasks (tested on versions 0.6 and 1.0b3, available from the project's download page)
Install and Add Support to Project
First, download the ANTLR v3 Ant task and copy the task's antlr3.jar
file into the NetBeansInstallDir/java2/ant/lib
directory, where NetBeansInstallDir
is the directory where NetBeans is installed (e.g. C:\Program Files\NetBeans 6.5
on a Windows system).
Next, download the Ant-contrib tasks and copy the ant-contrib-1.0b3.jar
to the same directory as above.
Next, start NetBeans, and open the project that you'd like to add ANTLR support to. Click on the "Files" tab, and find the build.xml
file. Double click to open it. Be careful not to change anything at the top of this file, especially the contents of the <project>
and <import>
tags. Scroll down past the comments, and find the closing </project>
tag. Add the following above the closing </project>
tag, but below the closing comment tag (-->
). In the first three lines, below, replace AntlrInstallDir
with the location of your ANTLR installation (e.g., mine is C:\java\antlr-3.1.2).
<property name="antlr.libdir" location="AntlrInstallDir/lib" /> <property name="antlr.tooldir" location="AntlrInstallDir/lib" /> <property name="antlr.runtimedir" location="AntlrInstallDir/lib" /> <patternset id="antlr.libs"> <include name="stringtemplate-3.1.jar" /> <include name="antlr277.jar" /> </patternset> <patternset id="antlr.tool"> <include name="antlr-3.1.2.jar" /> </patternset> <patternset id="antlr.runtime"> <include name="antlr-runtime-3.1.2.jar" /> </patternset> <path id="antlr.path"> <fileset dir="${antlr.tooldir}" casesensitive="yes"> <patternset refid="antlr.tool" /> </fileset> <fileset dir="${antlr.runtimedir}" casesensitive="yes"> <patternset refid="antlr.runtime" /> </fileset> <fileset dir="${antlr.libdir}" casesensitive="yes"> <patternset refid="antlr.libs" /> </fileset> </path> <target name="-pre-init"> <taskdef resource="net/sf/antcontrib/antlib.xml"/> </target> <target name="-post-clean"> <fileset id="antlr.grammars" dir="${src.dir}" includes="**/*.g"/> <pathconvert property="antlr.clean.files" pathsep=',' refid="antlr.grammars"> <compositemapper> <globmapper from="${basedir}${file.separator}${src.dir}${file.separator}*.g" to="*.tokens"/> <globmapper from="${basedir}${file.separator}${src.dir}${file.separator}*.g" to="*Parser.java"/> <globmapper from="${basedir}${file.separator}${src.dir}${file.separator}*.g" to="*Lexer.java"/> <chainedmapper> <globmapper from="${basedir}${file.separator}${src.dir}${file.separator}*.g" to="*.g"/> <compositemapper> <regexpmapper from="(([^/]*/)*).*\.g" to="\1__Test__.java" handledirsep="true"/> <regexpmapper from="(([^/]*/)*).*\.g" to="\1__Test___input.txt" handledirsep="true"/> </compositemapper> </chainedmapper> </compositemapper> </pathconvert> <pathconvert property="antlr.clean.dirs" pathsep=',' refid="antlr.grammars"> <chainedmapper> <globmapper from="${basedir}${file.separator}${src.dir}${file.separator}*.g" to="*.g"/> <compositemapper> <regexpmapper from="(([^/]*/)*).*\.g" to="\1classes/**/*" handledirsep="true"/> <regexpmapper from="(([^/]*/)*).*\.g" to="\1classes" handledirsep="true"/> </compositemapper> </chainedmapper> </pathconvert> <if> <not> <equals arg1="${antlr.clean.files}" arg2=""/> </not> <then> <echo level="info">Cleaning ANTLR- and ANTLRWorks-generated files (if any exist):${line.separator}</echo> <delete quiet="true" verbose="true"> <FileSet dir="${src.dir}" includes="${antlr.clean.files}" excludes="${src.dir}"/> </delete> </then> </if> <if> <not> <equals arg1="${antlr.clean.dirs}" arg2=""/> </not> <then> <echo level="info">Cleaning ANTLRWorks-generated directories (if any exist):${line.separator}</echo> <delete quiet="true" verbose="true" includeemptydirs="true"> <FileSet dir="${src.dir}" includes="${antlr.clean.dirs}" excludes="${src.dir}"/> </delete> </then> </if> </target> <target name="-pre-compile-single"> <basename property="javac.includes.base" file="${javac.includes}"/> <if> <equals arg1="${javac.includes.base}" arg2="*"/> <then> <for param="antlr.target"> <path> <fileset dir="${src.dir}" includes="${javac.includes}.g"/> </path> <sequential> <antlr:antlr3 xmlns:antlr="antlib:org/apache/tools/ant/antlr" target="@{antlr.target}"> <classpath> <path refid="antlr.path" /> </classpath> </antlr:antlr3> </sequential> </for> </then> </if> </target> <target name="-pre-compile"> <for param="antlr.target"> <path> <fileset dir="${src.dir}" includes="**/*.g"/> </path> <sequential> <antlr:antlr3 xmlns:antlr="antlib:org/apache/tools/ant/antlr" target="@{antlr.target}"> <classpath> <path refid="antlr.path" /> </classpath> </antlr:antlr3> </sequential> </for> </target>
Usage
The above integrates with the regular Java build/clean cycle in NetBeans. Execute the IDE's build operation, and the grammar file(s) will be passed through the ANTLR tool to generate the necessary Java code prior to the compile step. You can access the build operation either from the "Run" menu, by pressing <F11>, or by right-clicking on the project in the "Projects" tab, and selecting "Build" from the contect menu. You can also right-click a package and choose "Compile Package" to generate and compile for a single package. Context menu support is not available for the grammar file itself, unfortunately.
The IDE's clean operation will remove all generated .java
and .tokens
files, and it will also remove files generated from an ANTLRWorks debug session. The latter allows you to use both NetBeans and ANTLRWorks together on a grammar file, and when ready, clean and build from NetBeans. If your grammar file is named grammarFilename.g
, then the following files and subdirectories will be deleted (if they exist) from the directory that contains the grammar file, when you invoke the clean operation:
grammarFilename.tokens
grammarFilenameLexer.java
grammarFilenameParser.java
_Test_.java
(created from an ANTLRWorks debug session)_Test_input.txt
(created from an ANTLRWorks debug session)classes
directory found below the directory thatgrammarFilename.g
is found in (created from an ANTLRWorks debug session)
The clean operation is accessed by pressing <Shift>-<F11> (for a clean and build in one step), selecting "Clean and Build" from the "Run" menu, or "Clean" (or "Clean and Build") from the project's context menu. You can control what does and does not get deleted by modifying the -post-clean
target in the above example (requires knowledge of Ant build script syntax).
Xcode
will follow...