just had some cool ideas about a semantic rule specification language...i should write that up too... (also think about multi-threaded rewriting; no side-effects required
Some language translation problems can be described with a few rewrite rules that are predicated upon their context and potentially an arbitrary Boolean expression. For example, consider a few identity transformations such as
...
Sometime I think I will spend a couple of weeks and try to build one myself ANTLR-style. The first thing I realize is that concrete syntax, while beautiful, is not always powerful enough to do what we want (I suppose it could be an option). For example, what if you want to match a series of identifiers and transform them to something else? You need some kind of repetition operator inside the strings of concrete syntax, but why invent new syntax. We have grammars to specify languages. Grammars both recognize and generate language as you see in ANTLR parser grammar to treat tree grammar fragment rewrite rules. So, I propose something more along the lines of grammar to grammar transformations. Elements on the left are available as streams of data to the template on the right-hand side:
...
In this case, whatever was matched for the type
rule would be replicated for each identifier match on the input stream. Input tokens int i,j
would become output token stream int i; int j;
. This token stream could then be repeatedly processed by the rules until nothing had changed, signifying the end of translation. Naturally, one could design rewrite rules that flip back and forth between two patterns causing an infinite loop. This is a well-known problem and with rewrite systems. I'm satisfied to send the simply call that a bug in your rewrite rules, though I'm sure I could come up with something nice to let you know precisely which rules were the problem.
...
No Format |
---|
[localVar | field]: type ID -> blort |
In general, the static stack context language is regular and rewrite rule candidates could be found for a given stack using a DFA looking from right to left on the stack. In other words, If the stack top has rule var, then the DFA would rule out any context do not end in var or "...".
...
What is currentFunc, though? Normally that is a parser class instance variable set to the currently matched function by an action in the parser grammar. For constructs that can nest such as classes in some languages, the "current class" pointer needs to be treated like a stack. ANTLR has dynamically scoped very rules for this purpose; perhaps we could do something similar.
No Format |
---|
method:
scope { FunctionSymbol currentFunc; }
blah blah blah
|
The result of a pattern match could be an action or perhaps even just a context with no pattern:
No Format |
---|
method: {System.out.println("heh, I just entered a method")} // always exec in method context type ID {currentFunc = currentScope.resolve($ID.text);} // exec this but referredrefer to ID |
We might also need to be able to specify a predicate on token references:
No Format |
---|
expr:
ID<"null"> -> "nil" // test for null keyword as ID
^('*' INT<{is power of 2}> expr) -> (lg2={$INT.int}) "<expr> << <lg2>"
|
Rewrites for tree grammars
There is no reason we could not do rewrite patterns on trees. The left would always be a tree pattern but the right could be a tree pattern or template.
No Format |
---|
expr:
^('+' expr expr) -> add(blah blah blah) // rewrite with template ref
INT -> "0" // differentiate any integer to constant "0"
|
To rewrite to another tree:
No Format |
---|
expr:
^('+' INT<"0"> expr) -> expr
^('*' INT<"1"> expr) -> expr
|