Blog from November, 2008

TerenceP posted on Nov 30, 2008

So, let's do some rewriting using the pattern matching filter=true mode. Again, the VecMath.g parser will build trees but we'll avoid building an entire tree grammar. We'll focus on some patterns we want to rewrite.

Here's the grammar to build trees.

grammar VecMath;
options {output=AST;} // we want to create ASTs
tokens {
    MULT='*';
    SHIFT;    // needed during simplification
    VEC;      // define imaginary VEC for vector literal
}

prog:   stat+ ;                         // build list of stat trees
stat:   ID '=' expr  -> ^('=' ID expr)  // '=' is operator subtree root
    |   'print' expr -> ^('print' expr) // 'print' is subtree root
    ;

expr:   multExpr ('+'^ multExpr)* ;     // '+' is root node

multExpr
    :   primary (('*'^|'.'^) primary)*  // '*', '.' are roots
    ;

primary
    :   INT
    |   ID
    |   '[' expr (',' expr)* ']' -> ^(VEC expr+)
    |   '(' expr ')'             -> expr
    ;

ID  :   'a'..'z'+ ;
INT :   '0'..'9'+ ;
WS  :   (' '|'\r'|'\n')+ {skip();} ;

And now, let's distribute scalar-vector multiplies. 4*[1,2] -> [4*1,4*2].

tree grammar Simplify;
options {
    tokenVocab=VecMath;      // use tokens from VecMath.g
    ASTLabelType=CommonTree; // we're using CommonTree nodes
    output=AST;              // build ASTs from input AST
    filter=true;             // tree pattern matching, rewrited mode
}

topdown
    :   ^('*' INT ^(VEC (e+=.)+)) -> ^(VEC ^('*' INT $e)+)
    ;

bottomup
    :  ^('*' a=. b=INT {$b.int==0}?) -> $b // x*0 -> 0
    ;

Give it a shot:

        VecMathLexer lex = new VecMathLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        VecMathParser g = new VecMathParser(tokens);
        RuleReturnScope r = g.prog();   // launch parser by calling start rule
        CommonTree t = (CommonTree)r.getTree();   // get tree result
        System.out.println("Original tree: "+t.toStringTree());

        CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
        Simplify s = new Simplify(nodes);
        t = (CommonTree)s.downup(t);
        System.out.println("Simplified tree: "+t.toStringTree());

Output:

$ cat t1
x = 4 * [0, 5*0, 3]
$ java Test < t1
Original tree: (= x (* 4 (VEC 0 (* 5 0) 3)))
(* 4 (VEC 0 (* 5 0) 3)) -> (VEC (* 4 0) (* 4 (* 5 0)) (* 4 3))
(* 4 0) -> 0
(* 5 0) -> 0
(* 4 0) -> 0
Simplified tree: (= x (VEC 0 0 (* 4 3)))
$

Now for stuff that needs to repeatedly get applied to subtrees.

tree grammar Reduce;
options {
    tokenVocab=VecMath;      // use tokens from VecMath.g
    ASTLabelType=CommonTree; // we're using CommonTree nodes
    output=AST;              // build ASTs from input AST
    filter=true;             // tree pattern matching, rewrited mode
}

/** Rewrite: x+x to be 2*x, 2*x to be x<<1, x<<n<<m to be x<<(n+m) */
bottomup
    :  ^('+' i=INT j=INT {$i.int==$j.int}?) -> ^(MULT["*"] INT["2"] $j)
    |  ^('*' x=INT {$x.int==2}? y=.)        -> ^(SHIFT["<<"] $y INT["1"])
    |  ^(SHIFT ^(SHIFT e=. n=INT) m=INT)
       -> ^(SHIFT["<<"] $e INT[String.valueOf($n.int+$m.int)])
    ;

Test code:

        CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
        Reduce red = new Reduce(nodes);
        t = (CommonTree)red.downup(t);

Running:

$ cat t2
x = 2*(3+3)
$ java Test2 < t2
Original tree: (= x (* 2 (+ 3 3)))
(+ 3 3) -> (* 2 3)
(* 2 3) -> (<< 3 1)
(* 2 (<< 3 1)) -> (<< (<< 3 1) 1)
(<< (<< 3 1) 1) -> (<< 3 2)
Simplified tree: (= x (<< 3 2))
$

I'm replacing multiply with shift left by 1 and then combining shifts.

Woohoo! Tree pattern matching, rewriting a reality

TerenceP posted on Nov 29, 2008

Can't resist showing off new filter mode for tree grammars (this is working in my dev branch). Imagine I built some trees with Cymbal.g and want to define symbols and push/pop scopes. Previously you had to give full tree grammar even though we'd only have actions in a few spots and don't care about structure (we trust tree builder). By doing tree pattern matching, we get to focus only on those subtrees we care about.

We are also separating the patterns from the strategy to apply them (many thanks to Eelco Visser and guidance I got from his Stratego/XT tool. I should also mention that Jurge Vinju of ASF+SDF fame explained a lot of his rewrite technology this summer when I visited him at CWI in Amsterdam...btw, ever wonder why Holland has so many good coders? Weird. A new Dutch USF grad student, Jasper Roel, has lots of potential). I have defined a default down-then-up strategy that applies matching rules once on way down and repeatedly on way up. It invokes rules topdown and bottomup if defined. You can create your own easily with a bit of code. Then just create a tree parser (Def here) and ask it to execute the downup strategy:

CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Def def = new Def(nodes);
def.downup(t);

Here are the patterns:

tree grammar Def;

options {
  tokenVocab = Cymbal;
  ASTLabelType = CommonTree;
  filter = true;
}

topdown
    :   enterClassScope
    |   enterBlock
    |   fieldDeclaration
    |   methodDeclaration
    |   formalParameters
    |   localDeclaration
    ;

bottomup
    :   exitClassScope
    |   exitBlock
    ;

// S C O P E S

enterClassScope
    :   ^('class' ID (^('extends' sup=ID))? .)
        {System.out.println("enter class");}
    ;

exitClassScope : 'class' {System.out.println("exit class");} ;

enterBlock
    :   BLOCK {System.out.println("enter block");}
    ;
    
exitBlock
    :   BLOCK {System.out.println("exit block");}
    ;

// S Y M B O L S

fieldDeclaration
    :   ^(FIELD_DECL type=. ID .?) {System.out.println("field "+$ID);}
    ;
    
localDeclaration
    :   ^(LOCAL_DECL type=. ID .?) {System.out.println("local "+$ID);}
    ;

methodDeclaration
    :   ^(METHOD_DECL type=. ID .+) {System.out.println("def method "+$ID);}
    ;

formalParameters
    :   (^(PARAMETER type=. ID) {System.out.println("def arg "+$ID);})+ 
    ;

Ain't that cool?

Here is the downup strategy where we are only matching not rewriting:

    public void downup(Object t) {
        TreeVisitor v = new TreeVisitor(new CommonTreeAdaptor());
        TreeVisitorAction actions = new TreeVisitorAction() {
            public Object pre(Object t)  { applyOnce(t, topdown_fptr); return t; }
            public Object post(Object t) { applyOnce(t, bottomup_fptr); return t; }
        };
        v.visit(t, actions);
    }

Then for tree grammars with filter=true, output=AST:

    public Object downup(Object t) {
        TreeVisitor v = new TreeVisitor(new CommonTreeAdaptor());
        TreeVisitorAction actions = new TreeVisitorAction() {
            public Object pre(Object t)  { return applyOnce(t, topdown_fptr); }
            public Object post(Object t) { return applyRepeatedly(t, bottomup_ftpr); }
        };
        t = v.visit(t, actions);
        return t;
    }

We applyRepeatedly on way up and pass around result trees in case we need to walk them.

org.antlr.runtime.tree.TreeFilter and TreeRewriter are superclass of generated tree parser and derive from TreeParser.

I define two rules you can implement but you could define any strategy and any rule dependencies you wanted:

    // methods the downup strategy uses to do the up and down rules.
    // to override, just define tree grammar rule topdown and turn on
    // filter=true.
    public Object topdown() throws RecognitionException { return null; }
    public Object bottomup() throws RecognitionException { return null; }

Implementing tree pattern matching

TerenceP posted on Nov 22, 2008

Writing down random thoughts as I have them.

Imagine rules with strategy name, order name, and apply name:

a downup down once
b downup up once
c downup up once
d downup up repeat

strategy has visit order and pattern application rule

default is

	strategy: downup, order={down, up}, apply={once,repeat}
	up, order=up, apply=repeat
	reverse, order={pre,post}, apply={once,repeat,rewriteSubtrees}

collect all rules with same strategy into rule

downup_down_once_patterns
	:	a
	;
downup_up_once_patterns
	:	b
	|	c
	;
downup_up_repeat
	:	d
	;

gen applyOnce() for each generated rule

	applyOnce_downup_down_once_patterns(t)
	applyOnce_downup_up_once_patterns(t)
	applyOnce_downup_up_repeat(t)

This creates node stream for t from existing buffer if possible and then
calls the pattern grouping rule. Easy if we don't do replaces. Replace on way up: we have to walk new tree. On way down, we also have to walk new tree.

We're rewriting tree, perhaps we should update node stream buffer as we go using a rewrite engine approach. Hmm...messes with node indexes.

Can we walk trees without creating buffer of entire tree? Perhaps we can make an on-demand buffer that doesn't go into subtrees unless the parser does. The wildcard would tell it to skip. Only problem is lookahead. Maybe we let LT(i) do the on-demand loading. We could add a skipOverSubtree() method to node stream so wildcard could avoid descending.

Patter ^( VAR . ID ) yields:

match(input,VAR,FOLLOW_VAR_in_a13); 
match(input, Token.DOWN, null); 
matchAny(input); 
match(input,ID,FOLLOW_ID_in_a17); 
match(input, Token.UP, null);

On-demand serialization of a tree in java (no continuations) would be a pain. Would have to maintain a stack. Might be better for deep tree anyway. skip in node stream would skip over DOWN...UP sequence or just one token if not DOWN.