Exploring Concept of TreeWizard

I've been pretty sick this last week so I decided to work on something that was fun. After talking with Don Bradshaw at http://www.quipoz.com in Sydney and after trying to convert ANTLR v3's grammar (written in v2) into v3, I decided to build TreeWizard. ANTLR has a nice facility for creating trees from input symbols like:

decl : 'int' ID ';' ; -> ^(DECL 'int' ID)

But, sometimes outside of the grammar file (or in severe cases in a grammar action) you need to build a tree in raw code. Further, there are many cases, as Andy Tripp will tell you, that building a tree grammar is a overkill just to examine a few subtrees. You might want to simply print out all of the declarations defined in the tree. You don't care about the tree structure except for the declaration itself. A grammar for the full tree is a lot of unnecessary work.

TreeWizard is my first attempt at a utility class that lets you create and navigate trees without relying on an ANTLR grammar. You will find it attached with about 30 unit tests. This is only my development branch so we are free to update, change, and otherwise mangle the definition. I just want to make sure that we have something out there that people can try out to make sure I'm going in the right direction. See the unit tests for more examples than I give here. Also note that I think I changed CommonTree's getText to not call toString(), but call getText():

public String getText() {
	return token.getText();
}

Creating trees

Creating a tree is a simple matter of supplying a pattern for the tree using the token names from an ANTLR grammar:

TreeAdaptor adaptor = new CommonTreeAdaptor();
String[] tokens = myparser.getTokenNames();
TreeWizard wiz = new TreeWizard(adaptor, tokens);
CommonTree t = (CommonTree)wiz.create("(A B C)");

where the tokens from your parser look something like:

public static final String[] tokenNames = new String[] {
    "<invalid>", "<EOR>", "<DOWN>", "<UP>", "A", "B", "C"
};

Once you create a TreeWizard, you can keep calling create as it will know what kind of nodes to make, how to hook them up, and what the token types are for the token names. The wizard really only has write-once data in it...just a convenience so you don't have to keep passing stuff to each of the methods.

Oh, if you want to set the text of a node, use the argument syntax with no quotes:

CommonTree t = (CommonTree)wiz.create("(A B[foo] C[bar])");

Tree Equality

If you want to compare two trees, you'd have to write your own method to do so right now. The equals() method in Java is not necessarily the right way to implement either because that is probably just matching to nodes not trees. The TreeWizard provides a method to check the equality two trees (one static and one instance method):

/** Compare t1 and t2; return true if token types/text, structure match exactly.
 *  The trees are examined in their entirety so that (A B) does not match
 *  (A B C) nor (A (B C)). 
public static boolean equals(Object t1, Object t2, TreeAdaptor adaptor);

/** Compare type, structure, and text of two trees, assuming adaptor
 *  in this instance of a TreeWizard.
 */
public boolean equals(Object t1, Object t2);

If you would only like to match the structure of a tree, but not the text associated with each node (so that, say, any ID node will match), use the parse method:

boolean matched = wiz.parse(t, "(A B C)");

Wildcards work too:

CommonTree t = (CommonTree)wiz.create("(A B C)");
boolean matched = wiz.parse(t, "(A . .)");

Sometimes you want some of the nodes to match the text exactly, but most of the time match just the structure. You can do that too:

CommonTree t = (CommonTree)wiz.create("(A B[foo] C)");
boolean matches = wiz.parse(t, "(A B[foo] C)");

Getting pointers to subtree nodes

Matching trees is great, but most of the time you want to get a pointer to some of the children not just determine if the tree structure matches. To accomplish this, label the elements in the tree pattern and pass in a hash table that the parse() method can fill in:

CommonTree t = (CommonTree)wiz.create("(A B C)");
Map labels = new HashMap();
boolean valid = wiz.parse(t, "(%a:A %b:B %c:C)", labels);

Then labels.get("a") points to the root etc... Cool, eh?

Indexing trees

Sometimes you want a list of all nodes with a certain type like VARDECL or METHOD:

List nodes = wiz.find(t, wiz.getTokenType("VARDECL"));

Sometimes you want all lists for each token type:

// from the unit tests
CommonTree t = (CommonTree)wiz.create("(A B (A C B) B D D)");
Map m = wiz.index(t);
String found = m.toString();
String expecting = "{8=[D, D], 6=[B, B, B], 7=[C], 5=[A, A]}";

Visitors

In order to visit all nodes a particular type for all subtree that match a particular pattern, use the visit() method. For each node or subtree that matches, TreeWizard invokes the method and the following interface:

public interface ContextVisitor {
	public void visit(Object t, Object parent, int childIndex, Map labels);
}

Notice that it passes you a lot of interesting contact information including "who's your daddy?" and "what child am I?" The labels printer is not set unless you're matching against a tree pattern (i.e., won't work for matching against a token type).

Here is some code that visits every B node and puts them into a list, which is of course how I implemented the find() that returns a list.

CommonTree t = (CommonTree)wiz.create("(A B (A C B) B D D)");
final List elements = new ArrayList();
wiz.visit(t, wiz.getTokenType("B"),
	   new TreeWizard.Visitor() {
			public void visit(Object t) {
				elements.add(t);
			}
	   });

where Visitor is just a convenience class that makes it easier for you to implement if all you want to do is visit without context information:

public static abstract class Visitor implements ContextVisitor {
	public void visit(Object t, Object parent, int childIndex, Map labels) {
		visit(t);
	}
	public abstract void visit(Object t);
}

Now, let's say that you want to match a bunch of subtrees for declarations and pick out the identifier in each. No problem:

wiz.visit(t, "(VARDECL %type:. %id:ID)",
	   new TreeWizard.ContextVisitor() {
		   public void visit(Object t, Object parent, int childIndex, Map labels) {
		       // labels.get("type") is first child
		       // labels.get("id") is 2nd child
		   }
	   });