...
If the token that marks the start of the island grammar within your input can also has other uses within your language, the basic 'island-grammar' technique can't be used.
For instance, look at an example of a regular expression literal:
Code Block |
---|
r = / b; f = r/m; |
This assigns the regular expression ' b; f = r' (with the flag 'm') to the variable 'r'.
...
Building a parser to use this TokenStream will then look like,
Code Block |
---|
ANTLRReaderStream charstream = new ANTLRReaderStream(in); AS3ParserLexer lexer = new AS3ParserLexer(charstream); LinkedListTokenSource linker = new LinkedListTokenSource(lexer); LinkedListTokenStream tokenStream = new LinkedListTokenStream(linker); AS3Parser parser = new AS3Parser(tokenStream); |
...
We extend our grammar definiton to deal with this kind of literal, and add the output of the island grammar to the AST being built,
Code Block |
---|
constant : regexpLiteral | HEX_LITERAL | DECIMAL_LITERAL | OCTAL_LITERAL | FLOAT_LITERAL | STRING_LITERAL | TRUE | FALSE ; regexpLiteral @init { LinkedListTree re = null; } : '/' { re=handleRegexp(); } -> ^( {re} ) ; |
...
Code Block |
---|
private AS3ParserLexer lexer;
private CharStream cs;
public void setInput(AS3ParserLexer lexer, CharStream cs) {
this.lexer = lexer;
this.cs = cs;
}
|
Now we should have the pieces we need to implement the handleRegexp() method:
Code Block |
---|
private LinkedListTree handleRegexp() throws RecognitionException {
String tail = cs.substring(cs.index(), cs.size()-1);
RegexpParser parser;
try {
parser = createRegexpParser(new StringReader(tail), stream);
} catch (IOException e) {
throw new RuntimeException(e);
}
LinkedListTree ast = ASTUtils.tree(parser.regexp());
// now, restore in input state of the outer grammar to something sane,
tail = parser.getInputTail();
try {
cs = new ANTLRReaderStream(new StringReader(tail));
} catch (IOException e) {
throw new RuntimeException(e);
}
lexer.setCharStream(cs);
LinkedListTokenSource source = (LinkedListTokenSource)stream.getTokenSource();
stream.setTokenSource(source); // drop any remaining 'Regexp state' from stream
source.setDelegate(lexer); // custom method of LinkedListTokenSource
return ast;
}
|
Conclusion
This does seem to be a pretty complex way of doing things, but it also seems to work. Repeated string-copying in this implementation probably also means that it isn't the speediest solution.
I am currently in the process of trying to use this technique to process E4X XML literals in my target language (see outer grammar as3 AS3.g3 and island grammar E4X.g).