Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Suppose that you have a simple grammar for reading an HTML document.  I'm sure that this has "never" been done before, so don't yawn and blink too much. 

No Format

grammar HtmlDoc;
options { output=AST; }
tokens {
	DOC='doc';
	TITLE='title';
	BODY='body';
}

html_doc
	: '<html>' html_header html_body '</html>' -> ^('doc' html_header html_body);

html_header
	: '<title>' TEXT '</title>' -> ^('title' TEXT) ;

html_body
	: '<body>' TEXT '</body>' -> ^('body' TEXT)	;

TEXT : (~('<'))*;

...

Writing the interface to get the ANTLR output is a little more complicated than earlier versions, but the result is more powerful. The basic glue to get things going has changed as well:

No Format

import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.TokenRewriteStream;
...
ANTLRFileStream fs = new ANTLRFileStream("test.html");
HtmlDocLexer lex = new HtmlDocLexer(fs);
TokenRewriteStream tokens = new TokenRewriteStream(lex);
HtmlDoc grammar = new HtmlDoc(tokens);

...

The main gist of getting data back is through call-backs, or rather calls through a class you provide. This section starts off with a simpler form that helps you get the idea. First you need to create a tree adaptor (i.e., CommonTreeAdaptor). This class interfaces with the rewrite code that keeps track of the data and its location in the tree (remember the arrays of children and children of children mentioned above?).

No Format

static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
	public Object create(Token payload) {
		return new CommonTree(payload);
	}
};

...

The magic code now needs to be hooked in to the grammar, so that the grammar knows what to use to create the tree. The first statement, below, does the hook up. The next step engages the grammar and lexer to parse over the file provided it earlier.

No Format

grammar.setTreeAdaptor(adaptor);
HtmlDoc.html_doc_return ret = grammar.html_doc();
CommonTree tree = (CommonTree)ret.getTree();

...

If you recall, the tree is organized in arrays of children and children can have arrays of children. Effectively, then, you can write a simple recursive procedure to walk the tree:

No Format

public void printTree(CommonTree t, int indent) {
	if ( t != null ) {
		StringBuffer sb = new StringBuffer(indent);
		
		if (t.getParent() == null){
			System.out.println(sb.toString() + t.getText().toString());	
		}
		for ( int i = 0; i < indent; i++ )
			sb = sb.append("   ");
		for ( int i = 0; i < t.getChildCount(); i++ ) {
			System.out.println(sb.toString() + t.getChild(i).toString());
			printTree((CommonTree)t.getChild(i), indent+1);
		}
	}
}

...

The magic that was mentioned earlier really has an extended purpose and not meant to make your life more, um, challenging. Suppose that you wanted to use the tree as a medium to process information. The tree, and thus the individual nodes, would then need to be flexible enough to handle the annotations. You can do this by extending the CommonTree class:

No Format

import org.antlr.runtime.tree.*;
import org.antlr.runtime.Token;

public class HtmlAST extends CommonTree {
	public String text; 
	
	public HtmlAST(Token t) {
		super(t);
		text = (t != null? t.getText(): "[]");
	}

	public String toString() {
		return text + super.toString();
	}
}

In this case 'text' would simply echo what the parent (CommonTree) already has. But, you can see how this would be useful, allowing you to add your own custom fields. The hook up is even more simple:

No Format

static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
	public Object create(Token payload) {
		return new HtmlAST(payload);
	}
};

...