Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: I changed the case of the referenced filename in version control

...

If the token that marks the start of the island grammar within your input can also has other uses within your language, the basic 'island-grammar' technique can't be used.
  For instance, look at an example of a regular expression literal:

Code Block
r =     / b; f = r/m;

This  assigns the regular expression ' b; f = r' (with the flag 'm') to the variable 'r'.

...

Building a parser to use this TokenStream will then look like,

Code Block

       
ANTLRReaderStream charstream = new ANTLRReaderStream(in);
       
        AS3ParserLexer lexer = new AS3ParserLexer(charstream);
              LinkedListTokenSource linker = new LinkedListTokenSource(lexer);
              LinkedListTokenStream tokenStream = new LinkedListTokenStream(linker);
              AS3Parser parser = new AS3Parser(tokenStream);

...

We extend our grammar definiton to deal with this kind of literal, and add the output of the island grammar to the AST being built,

Code Block
constant
       :      regexpLiteral
       |       HEX_LITERAL
       |       DECIMAL_LITERAL
       |       OCTAL_LITERAL
       |       FLOAT_LITERAL
       |       STRING_LITERAL
       |       TRUE
       |       FALSE
       ;

regexpLiteral
    @init { LinkedListTree re = null; }
    :    '/' { re=handleRegexp(); }  ->  ^( {re} )
    ;

...

Code Block
private AS3ParserLexer lexer;
	private CharStream cs;

	public void setInput(AS3ParserLexer lexer, CharStream cs) {
		this.lexer = lexer;
		this.cs = cs;
	} 

 Now we should have the pieces we need to implement the handleRegexp() method:

Code Block
private LinkedListTree handleRegexp() throws RecognitionException {
		String tail = cs.substring(cs.index(), cs.size()-1);
		RegexpParser parser;
		try {
			parser = createRegexpParser(new StringReader(tail), stream);
		} catch (IOException e) {
			throw new RuntimeException(e);
		}
		LinkedListTree ast = ASTUtils.tree(parser.regexp());
		// now, restore in input state of the outer grammar to something sane,
		tail = parser.getInputTail();
		try {
			cs = new ANTLRReaderStream(new StringReader(tail));
		} catch (IOException e) {
			throw new RuntimeException(e);
		}
		lexer.setCharStream(cs);
		LinkedListTokenSource source = (LinkedListTokenSource)stream.getTokenSource();
		stream.setTokenSource(source);  // drop any remaining 'Regexp state' from stream
		source.setDelegate(lexer);  // custom method of LinkedListTokenSource
		return ast;
	} 

 Conclusion

This does seem to be a pretty complex way of doing things, but it also seems to work.  Repeated string-copying in this implementation probably also means that it isn't the speediest solution.

I am currently in the process of trying to use this technique to process E4X XML literals in my target language (see outer grammar as3 AS3.g3 and island grammar E4X.g).