ANTLR 3.1.2 Release Notes

ANTLR v3.1.2

February 22, 2009

Terence Parr
ANTLR project lead and supreme dictator for life
University of San Francisco
Credits

ANTLR v3.1.2 is a bug fix release (with one new tool, Strip).

New tool: Strip

To strip a grammar down to just the grammar components do this:

$ java org.antlr.tool.Strip T.g

If T.g is:

grammar T;

options {k=1; output=AST;}

@header {kill}

@lexer::header {...}

@members {
akdsfljklas
}

z 
scope A;
	:	q=a[34] ids+=ID
	;

a[int i] returns [float f]
options {k=1;}
@init {rmoeve}
 : {sfkljlsf} B^ {on end}
 | '3' -> '3' {end}

   {aaa}
 ;

b
scope {
	String name;
}
 : B
    	-> {$arg!=null&&op!=null}?	^($op RULE_REF $arg)
    	-> 				^(RULE_REF $arg)
 ;

c : {true}? B | {false}?=> C ;

fragment A : '0' {skip();} ; // leave lexer actions
B : ~A
'z' ;

you'll get the following output

grammar T;

options {k=1; output=AST;}

z 

	:	a ID
	;

a 
options {k=1;}

 :  B 
 | '3' 
 ;

b

 : B
    	
    	
 ;

c : /*{true}?*/ B | /*{false}?=>*/ C ;

fragment A : '0' {skip();} ; // leave lexer actions
B : ~A
'z' ;

Changes

  • Added org.antlr.tool.Strip (reads from file arg or stdin, emits to stdout) to strip actions from a grammar.
  • Added CommonTree.setUnknownTokenBoundaries(). Sometimes we build trees in a grammar and some of the token boundaries are not set properly. This only matters if you want to print out the original text associated with a subtree. Check this out rule:
    postfixExpression
        :   primary ('.'^ ID)*
        ;
    
    For a.b.c, we get a '.' that does not have the token boundaries set. ANTLR only sets token boundaries for subtrees returned from a rule. SO, the overall '.' operator has the token boundaries set from 'a' to 'c' tokens, but the lower '.' subtree does not get the boundaries set (they are -1,-1). Calling setUnknownTokenBoundaries() on the returned tree sets the boundaries appropriately according to the children's token boundaries.
  • fixed to be listeners.add(listener); in addListener() of DebugEventHub.java
  • Removed runtime method: mismatch in BaseRecognizer and TreeParser. Seems to be unused. Had to override method recoverFromMismatchedToken() in TreeParser to get rid of single token insertion and deletion for tree parsing because it makes no sense with all of the up-and-down nodes.
  • Changed JIRA port number from 8888 to no port spec (aka port 80) and all refs to it in this file.
  • Changed BaseTree to Tree typecase in getChild and toStringTree() and deleteChild() to make more generic.
  • Added -verbose cmd-line option and turned off standard header and list of read files. Silent now without -verbose.
  • null-ptr protected getParent and a few others.
  • Added new ctor to CommonTreeNodeStream for walking subtrees. Avoids having to make new serialized stream as it can reuse overall node stream buffer.
  • Updated BaseTest to isolate tests better.
  • BaseTreeAdaptor.getType() was hosed; always gave 0. Thanks to Sam Harwell.
  • Added methods to BaseRecognizer:
      public void setBacktrackingLevel(int n) { state.backtracking = n; }
      /** Return whether or not a backtracking attempt failed. */
      public boolean failed() { return state.failed; }
    
  • Tweaked traceIn/Out to say "fail/succeeded"
  • Bug in code gen for tree grammar wildcard list label x+=.
  • Use of backtrack=true anywhere in grammar causes backtracking sensitive code to be generated. Actions are gated etc... Previously, that only happened when a syntactic predicate appeared in a DFA. But, we need to gate actions when backtracking option is set even if no decision is generated to support filtering of trees.
  • Fixed debug event socket protocol to allow spaces in filenames.
  • Added TreeVisitor and TreeVisitorAction to org.antlr.runtime.tree.
  • Added 3 methods to Tree interface BREAKS BACKWARD COMPATIBILITY
        /** Is there is a node above with token type ttype? */
        public boolean hasAncestor(int ttype);
    
        /** Walk upwards and get first ancestor with this token type. */
        public Tree getAncestor(int ttype);
    
        /** Return a list of all ancestors of this node.  The first node of
         *  list is the root and the last is the parent of this node.
         */
        public List getAncestors();
    
  • Updated unit tests to be correct for \uFFFE->\uFFFF change
  • Made . in tree grammar look like ^(. .*) to analysis, though ^(. foo) is illegal (can't have . at root). Wildcard is subtree or node. Fixed bugs: ANTLR-248, ANTLR-344

From bug tracking system

C target

  • ANTLR-288 SKIP() does not handle WS quite correctly
  • ANTLR-379 Output=ASRT caused misnaming of global scopes if both backtracking and memoizing
  • ANTLR-328 setTokenStream resets the tokenNames pointer to NULL
  • ANTLR-325 User claims that wildcards do not work for subtrees
  • ANTLR-365 Check that EOF tokens etc have their sting factories initialized properly
  • ANTLR-363 C runtime is incosisten with Java in respect of when text of HIDDEN tokens is printed
  • ANTLR-362 Check the strings factories are always freed correctly in all cases
  • ANTLR-367 Composition tree grammras do not compiler
  • ANTLR-349 ANTLR3_MARKER is not used in all cases in the templates
  • ANTLR-350 Must revert to using memove when manipulating ANTLR3_STRINGS
  • ANTLR-361 Ensure that ANTLR3_BOOLEAN is defined correctly under MingW
  • ANTLR-354 See if we can drop the explicit use of 0.0D in initializer tables
  • ANTLR-351 Ensure output stream consistency with the output of messages when generating C
  • ANTLR-333 Check AST rree construction
  • ANTLR-352 Check that base tree adaptor getType does not always give 0 like in Java
  • ANTLR-358 Check token rewrites using $start
  • ANTLR-287 Ensure that license is attached to all source code files and that generated code has an appropraite license
  • ANTLR-282 Ensure NULL guards have been taken off all $refs
  • ANTLR-357 Check use of referenes like $start.pos
  • ANTLR-360 Review the correctness of the fillbuffer method
  • ANTLR-366 Check ther reset() mechanisms all work correctly
  • ANTLR-355 Check memory allocations in backtrack mode

Java target

  • ANTLR-336 White space should be escaped in remote debugger protocol

Python target

Patched in all changes from the Java target that are also applicable to Python. Highlights:

  • Fixed ANTLR-248, ANTLR-344. Made wildcard work properly in tree grammars.
  • Added TreeVisitor class.
  • Fixed x+=. issue with tree grammars
  • Added CommonTree.setUnknownTokenBoundaries()
  • Tweaked traceIn/Out to say "fail/succeeded"
  • Added methods to BaseRecognizer:
    • setBacktrackingLevel(n)
    • failed()
  • Added 3 methods to Tree:
    • hasAncestor(ttype)
    • getAncestor(ttype)
    • listAncestors()

C# target

Updated to 3.1.2. Also following improvements:

  • Possible BREAKING CHANGE: Obsoleted IIntStream's Size() method with the Count property. Unless deriving form IIntStream or any other inheriting interfaces directly, this change will only show up as warnings while compiling. Otherwise replacing the older binaries should give no problems.
  • Added <decisionNumber>-suffix to distinguish between exception variable names. This prevents compiler errors due the scope rules.
  • Moved the pragmas to suppress warnings from namespace inclusion template into the general template. Thus pragmas are included every time and not when using namespace.
  • Added for debugging parsers the symbol ANTLR_DEBUG. Allows to programmatically check if one uses the debug parser or the normal version.

General

  • ANTLR-344 wildcard in tree grammar with output=AST doesn't dup subtree, dups only node
  • ANTLR-291 filter output to use \n or \r\n in codegen according to locale
  • ANTLR-248 wildcard is single node in tree grammar analysis but node or tree at runtime

Incompatibilities

Added 3 methods to Tree interface; see above in changes. If you have defined a tree class that doesn't derive from BaseTree, you'll get a compiler error.