Matching parse tree patterns, paths

[See tree-patterns branch in parrt/antlr4]

ANTLR 4 introduced a visitor and listener mechanism that lets you implement DOM visiting or SAX-analogous event processing of tree nodes. This works great. For example, if all you care about our looking at Java method declarations, grab the Java.g4 file and then override methodDeclaration in JavaBaseListener. From there, a ParseTreeWalker can trigger calls to your overridden method for you as it walks the tree. Easy things are easy.

This mechanism works more or less on a node-level basis. In other words, for every method declaration subtree root, your methodDeclaration would get called. There are many situations where we care only about either: (1) method declarations within a particular context (e.g., nested within another method) or methods with specific structure or specific types (e.g., "void <ID>() { }"). It might also be the case that we want to group operations by a set of patterns in the tree rather than spreading them across listener event methods. Finally, sometimes we want just a list of all assignments anywhere in the tree. It's much easier to say ``go find me all "... = ... ;" subtrees'' rather than creating a class just to get a listener method for rule assignment and then passing the listener to the parse tree walker.

The other important idea here is that, since we're talking about parse trees not abstract syntax trees, we can use concrete patterns instead of tree syntax. For example, we can say "x = 0;" instead of "(= x 0)" where the ';' was probably stripped before it went into the AST.

Patterns

The most obvious way to specify a pattern is with a string with holes like a template. For example, to match all assignments with a single identifier on the left, we could do this:

"<ID> = <expr> ;"

where ID is a token identifier within the corresponding and expr is a rule. We might also want a wildcard that matches any node or subtree:

"<ID> = <...> ;"

Nah, let's leave for later when there's an obvious test case.

If we want a specific identifier or even an entire expression, we can use the literals:

"<ID> = 0 ;" // match any identifier assigned to 0

"super();" // match only a function call with no arguments with name super

We need a function like findall() that would return an iterator containing all subtrees that match the pattern starting at the root of the tree specified. That means we could do for loops easily to process patterns without relying on listeners but with the same flexibility:

SyntaxPattern pattern = new SyntaxPattern("<x:ID> = <e:expr>;", parser.getATN());
for (ParseTreeMatch ass : pattern.findall(tree)) {
    System.out.println(ass.getText()); // print entire assignment
    System.out.println(ass.get("x").getText()); // print id
}

Where ass.get("x") returns a parse tree node. We also need just a simple tree match() that captures values from the subelements:

SyntaxPattern pattern = new SyntaxPattern("<x:ID> = <e:expr>;", parser.getATN());
ParseTreeMatch ass : pattern.match(tree); // tree is an assignment subtree
System.out.println(ass.getText()); // print entire assignment
System.out.println(ass.get("x").getText()); // print id

If the elements are unique, you don't need the labels:

SyntaxPattern pattern = new SyntaxPattern("<ID> = <expr>;", atn);
ParseTreeMatch ass = pattern.match(tree); // tree is an assignment subtree
System.out.println(ass.get("ID").getText()); // print id

When there are multiple values, you call getAll("expr"). This also works for labels like getAll("x").

Restricting context

If you only want to match subtrees in a particular context, you can combine with XPath four parse trees.

SyntaxPattern pattern = new SyntaxPattern("<x:ID> = <e:expr>;", parser.getATN());
for (ParseTree t : XPath.findAll(tree, "/class/assignment, parser) { // find all assignments at class level
	ParseTreeMatch ass = pattern.match(t);
	if ( ass!=null ) {
	    System.out.println(ass.getText()); // print entire assignment
    	System.out.println(ass.get("x").getText()); // print id
	}
}

You can add whatever other tests you want once you know that you have the right trees syntactically in the right context.

XPath for parse trees

This functionality has now been integrated into the master branch of antlr4.

for (ParseTree t : XPath.findAll(tree, xpath, parser) ) { ... }

 For examples, see TestXPath.java