Tree pattern matching

New in ANTLR 3.2

Lots of examples and more explanation in Language Implementation Patterns. The code is freely available at the book website. Dir code/walking/patterns has a tree rewriting example using tree patterns.

Instead of specifying an entire tree grammar, a tree pattern matcher lets us focus on just those subtrees we care about. That's useful because different phases of a language application care about different parts of the tree. A tree pattern matcher also decouples the order in which we apply tree patterns from the tree patterns themselves. Unlike embedded walkers, visitors, and tree grammars, tree patterns don't specify how to walk the tree. ANTLR's new tree pattern matching engine dictates the tree traversal strategy. In its simplest form, a pattern matcher repeatedly tries to match patterns against subtrees. When it finds a match, the pattern matcher triggers an action or tree rewrite.

How to use tree pattern matching

Here is the interface to the tree filters. First, use the filter=true option in a tree grammar:

tree grammar MyPatterns;
options {
    tokenVocab=MyParser;
    ASTLabelType=CommonTree; // we're using CommonTree nodes
    filter=true;             // tree pattern matching mode
    backtrack=true;          // allow backtracking if it's needed
}

Run ANTLR on MyPatterns.g to get MyPatterns.java. Your test rig attaches the tree filter to a node stream just like a regular tree grammar. The only difference is that we call downup() instead of calling the start rule:

CommonAST t = ... tree created by MyParser ...;
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
MyPatterns p = new MyPatterns(nodes);
p.downup(t, true); // walk down then up, applying rules that match

In the grammar, define rules to indicate what patterns to check on the way down and what patterns to check on the way up. For example, here is what it might look like if you're doing symbol table management:

topdown
    :   enterBlock
    |   enterMethod
    |   enterStruct
    ;               
bottomup
    :   exitBlock
    |   exitMethod
    |   exitStruct
    ;
enterBlock
    :   BLOCK {currentScope = new LocalScope(currentScope);} // push scope
    ;
exitBlock
    :   BLOCK
        {
        //System.out.println("locals: "+currentScope);
        currentScope = currentScope.getEnclosingScope();     // pop scope
        }
    ;

How it works

Using TreeVisitor, we do a depth first walk of the tree, executing an action in the preorder position. To get a bottom-up visitor, we execute an action in the postorder position. If you look in org.antlr.runtime.tree.TreeFilter, you'll see:

public void downup(Object t) {
    TreeVisitor v = new TreeVisitor(new CommonTreeAdaptor());
    TreeVisitorAction actions = new TreeVisitorAction() {
      public Object pre(Object t)  { applyOnce(t, topdown_fptr); return t; }
      public Object post(Object t) { applyOnce(t, bottomup_fptr); return t; }
    };
    v.visit(t, actions);
}       

The applyOnce() method is invoked by the visitor for each node; it tries to match the topdown rule on the way down (node discovery) and rule bottomup on the way back up (node finishing). Those topdown_fptr type pointers are really pointers to rules that you must override:

// methods the downup strategy uses to do the up and down rules.
// to override, just define tree grammar rule topdown and turn on
// filter=true.
public Object topdown() throws RecognitionException { return null; }
public Object bottomup() throws RecognitionException { return null; }

Most of the time, a rule will not match the current subtree as we walk. To avoid issuing syntax errors and attempting error recovery, it bumps up the backtracking level (See the applyOnce() method). Upon failure, the invoked rule immediately returns. This method boils down to ``call a rule to match a tree, executing any embedded actions or rewrite rules.''

To do rewrites (see TreeRewriter), we usually need to apply rewrites multiple times on way up to catch all opportunities to rewrite:

public Object downup(Object t, boolean showTransformations) {
    this.showTransformations = showTransformations;
    TreeVisitor v = new TreeVisitor(new CommonTreeAdaptor());
    TreeVisitorAction actions = new TreeVisitorAction() {
        public Object pre(Object t)  { return applyOnce(t, topdown_fptr); }
        public Object post(Object t) { return applyRepeatedly(t, bottomup_ftpr); }
    };  
    t = v.visit(t, actions);
    return t;
}   

If you don't like the strategy that I've implemented by default, you can trivially make your own by cutting and pasting from downup().

Tree patterns

First, remember that we are not specify entire tree grammar. We are primarily looking for subtrees of interest. Tree grammars specify not only the patterns but how to visit the tree. With patterns, we use the dot (wildcard) a lot. The following pattern matches an add subtree anywhere in the tree:

add : ^('+' x=. y=.) {do something with $x and $y} ;

Sometimes we do want to specify what the operands are. For example, the following pattern looks for x+x type editions where the operands are the same integers. It uses a predicate to make the patterns fail if they aren't the same:

xPlusX : ^('+' i=INT j=INT {$i.int==$j.int}?) ;

When doing symbol table management, you will use rules to match the various definition subtrees. For example, here is a subtree pattern that matches a method declaration:

enterMethod
    :   ^(METHOD_DECL type=. ID .*) // match method subtree with 0-or-more args
        {
        System.out.println("line "+$ID.getLine()+": def method "+$ID.text);
        $type.scope = currentScope;
        MethodSymbol ms = new MethodSymbol($ID.text,null,currentScope);
        ms.def = $ID;            // track AST location of def's ID
        $ID.symbol = ms;         // track in AST
        currentScope.define(ms); // def method in globals
        currentScope = ms;       // set current scope to method scope
        }
    ;

We do that rule on the way down to create a new scope. On the way back up though we only need to notice that it's a method declaration. we don't need to match any of the children because we don't need to do anything with that information:

exitMethod
    :   METHOD_DECL
        {
        System.out.println("args: "+currentScope);
        currentScope = currentScope.getEnclosingScope();    // pop arg scope
        }
    ;

Tree rewriting

Think of the tree rewriting patterns as tree pattern to tree pattern mappings. For example, here is how we would do some Boolean simplifications:

e : ^('!' 'true') -> 'false' // !true -> false 
  | ^('!' 'false') -> 'true' // !false -> true 
  | ^('&&' 'true' x=.) -> $x // true && x -> x 
  | ^('&&' x=. 'true') -> $x // x && true -> x 
;

With predicates, we can check for multiply by zeroes:

zeroX : ^('*' a=INT b=INT {$a.int==0}?) -> $a ; // 0*x -> 0 
xZero : ^('*' a=INT b=INT {$b.int==0}?) -> $b ; // x*0 -> 0

The predicates prevent the alternatives from matching unless one of the integers has a 0 value.