just had some cool ideas about a semantic rule specification language...i should write that up too... (also think about multi-threaded rewriting; no side-effects required
Some language translation problems can be described with a few rewrite rules that are predicated upon their context and potentially an arbitrary Boolean expression. For example, consider a few identity transformations such as
...
Sometime I think I will spend a couple of weeks and try to build one myself ANTLR-style. The first thing I realize is that concrete syntax, while beautiful, is not always powerful enough to do what we want (I suppose it could be an option). For example, what if you want to match a series of identifiers and transform them to something else? You need some kind of repetition operator inside the strings of concrete syntax, but why invent new syntax. We have grammars to specify languages. Grammars both recognize and generate language as you see in ANTLR parser grammar to treat tree grammar fragment rewrite rules. So, I propose something more along the lines of grammar to grammar transformations. Elements on the left are available as streams of data to the template on the right-hand side:
...
In this case, whatever was matched for the type
rule would be replicated for each identifier match on the input stream. Input tokens int i,j
would become output token stream int i; int j;
. This token stream could then be repeatedly processed by the rules until nothing had changed, signifying the end of translation. Naturally, one could design rewrite rules that flip back and forth between two patterns causing an infinite loop. This is a well-known problem and with rewrite systems. I'm satisfied to send the simply call that a bug in your rewrite rules, though I'm sure I could come up with something nice to let you know precisely which rules were the problem.
...
No Format |
---|
[localVar | field]: type ID -> blort |
In general, the static stack context language is regular and rewrite rule candidates could be found for a given stack using a DFA looking from right to left on the stack. In other words, If the stack top has rule var, then the DFA would rule out any context do not end in var or "...".
Note that the rewrite matches only have to match starting at the left edge of the specified context rule. They do not have to match an exact alternative within that rule nor a complete alternative. You might only want to alter the first or second token:
No Format |
---|
classDef:
'class' ID -> "class _<ID>" // prefix with '_'
|
Arbitrary semantic content
Sometimes grammatical context is not enough. For example, you might want to change an identifier reference in an expression depending on symbol table information:
No Format |
---|
expr:
{is local or arg}? ID -> "locals.<ID>"
|
What if the right-hand side needs to reference something other than an element matched on the left-hand side? In this case, we need to allow actions to set attributes.
No Format |
---|
expr:
{is local or arg}? ID -> (fname={currentFunc.name}) "<fname>_locals.<ID>"
|
What is currentFunc, though? Normally that is a parser class instance variable set to the currently matched function by an action in the parser grammar. For constructs that can nest such as classes in some languages, the "current class" pointer needs to be treated like a stack. ANTLR has dynamically scoped very rules for this purpose; perhaps we could do something similar.
No Format |
---|
method:
scope { FunctionSymbol currentFunc; }
blah blah blah
|
The result of a pattern match could be an action or perhaps even just a context with no pattern:
No Format |
---|
method:
{System.out.println("heh, I just entered a method")} // always exec in method context
type ID {currentFunc = currentScope.resolve($ID.text);} // exec this but refer to ID
|
We might also need to be able to specify a predicate on token references:
No Format |
---|
expr:
ID<"null"> -> "nil" // test for null keyword as ID
^('*' INT<{is power of 2}> expr) -> (lg2={$INT.int}) "<expr> << <lg2>"
|
Rewrites for tree grammars
There is no reason we could not do rewrite patterns on trees. The left would always be a tree pattern but the right could be a tree pattern or template.
No Format |
---|
expr:
^('+' expr expr) -> add(blah blah blah) // rewrite with template ref
INT -> "0" // differentiate any integer to constant "0"
|
To rewrite to another tree:
No Format |
---|
expr:
^('+' INT<"0"> expr) -> expr
^('*' INT<"1"> expr) -> expr
|