Exploring Concept of TreeWizard

I've been pretty sick this last week so I decided to work on something that was fun. After talking with Don Bradshaw at http://www.quipoz.com in Sydney and after trying to convert ANTLR v3's grammar (written in v2) into v3, I decided to build TreeWizard. ANTLR has a nice facility for creating trees from input symbols like:

decl : 'int' ID ';' ; -> ^(DECL 'int' ID)

But, sometimes outside of the grammar file (or in severe cases in a grammar action) you need to build a tree in raw code. Further, there are many cases, as Andy Tripp will tell you, that building a tree grammar is a overkill just to examine a few subtrees. You might want to simply print out all of the declarations defined in the tree. You don't care about the tree structure except for the declaration itself. A grammar for the full tree is a lot of unnecessary work.

TreeWizard is my first attempt at a utility class that lets you create and navigate trees without relying on an ANTLR grammar. You will find it attached with about 30 unit tests. This is only my development branch so we are free to update, change, and otherwise mangle the definition. I just want to make sure that we have something out there that people can try out to make sure I'm going in the right direction.

Creating trees

Creating a tree is a simple matter of supplying a pattern for the tree using the token names from an ANTLR grammar:

TreeAdaptor adaptor = new CommonTreeAdaptor();
String[] tokens = myparser.getTokenNames();
TreeWizard wiz = new TreeWizard(adaptor, tokens);
CommonTree t = (CommonTree)wiz.create("(A B C)");

where the tokens from your parser look something like:

public static final String[] tokenNames = new String[] {
    "<invalid>", "<EOR>", "<DOWN>", "<UP>", "A", "B", "C"
};

Once you create a TreeWizard, you can keep calling create as it will know what kind of nodes to make, how to hook them up, and what the token types are for the token names. The wizard really only has write-once data in it...just a convenience so you don't have to keep passing stuff to each of the methods.

Tree Equality

If you want to compare to trees, you'd have to write your own method to do so right now. The equals() method in Java is not necessarily the right way to implement either because that is probably just matching to nodes not trees. The TreeWizard provides a method to check the equality two trees (one static and one instance method):

/** Compare t1 and t2; return true if token types/text, structure match exactly.
 *  The trees are examined in their entirety so that (A B) does not match
 *  (A B C) nor (A (B C)). 
public static boolean equals(Object t1, Object t2, TreeAdaptor adaptor);

/** Compare type, structure, and text of two trees, assuming adaptor
 *  in this instance of a TreeWizard.
 */
public boolean equals(Object t1, Object t2);

Indexing trees

Sometimes you want a list of all nodes with a certain type like VARDECL or METHOD:

List nodes = wiz.find(t, wiz.getTokenType("VARDECL"));

Sometimes you want all lists for each token type:

// from the unit tests
CommonTree t = (CommonTree)wiz.create("(A B (A C B) B D D)");
Map m = wiz.index(t);
String found = m.toString();
String expecting = "{8=[D, D], 6=[B, B, B], 7=[C], 5=[A, A]}";

Creating trees

Tree Equality

Indexing trees

Visitors

Tree pattern matching