How do I implement include files?

Stacked Input Streams - "include" files

Java

Here is sample Java-code for implementation 'include' directive. Thanks Terence Parr for important notices and Tim Clark for improvements.

@lexer::members {
    class SaveStruct {
      SaveStruct(CharStream input){
        this.input = input;
        this.marker = input.mark();
      }
      public CharStream input;
      public int marker;
     }

     Stack<SaveStruct> includes = new Stack<SaveStruct>();

    // We should override this method for handling EOF of included file
     public Token nextToken(){
       Token token = super.nextToken();

       if(token==Token.EOF_TOKEN && !includes.empty()){
        // We've got EOF and have non empty stack.
         SaveStruct ss = includes.pop();
         setCharStream(ss.input);
         input.rewind(ss.marker);
         //this should be used instead of super [like below] to handle exits from nested includes
         //it matters, when the 'include' token is the last in previous stream (using super, lexer 'crashes' returning EOF token)
         token = this.nextToken();
       }

      // Skip first token after switching on another input.
      // You need to use this rather than super as there may be nested include files
       if(((CommonToken)token).getStartIndex() < 0)
         token = this.nextToken();

       return token;
     }
 }

// and lexer rule
 INCLUDE
     : 'include' (WS)? f=STRING {
       String name = f.getText();
       name = name.substring(1,name.length()-1);
       try {
        // save current lexer's state
         SaveStruct ss = new SaveStruct(input);
         includes.push(ss);

        // switch on new input stream
         setCharStream(new ANTLRFileStream(name));
         reset();

       } catch(Exception fnf) { throw new Error("Cannot open file " + name); }
     }
     ;

How to process stacked input streams in C Target

Because this is a fairly common request and doing such things in C is always a little more time consuming, this built this into the standard ANTLR3 C runtime library (though you can always override the functions of course if they do not do quite what you want.

There are two macros:

PUSHSTREAM(s)

POPSTREAM()

The PUSHSTREAM(s) macros is generally the only thing you need and is available in the lexer only (for what should be ovbvious reasons, though I did wonder about supporting multiple parsing streams.. hmm). The C parser example in the examples tgz/zip available at the Fisheye repository. PUSHSTREAM saves teh current input stream and replaces it with the supplied stream 's'. It does NOT reset the lexer or input stream, thoguh you can do this in your lexer rule if required.

In brief, the parameter (s) to PUSHSTREAM is of type pANTLR3_INPUT_STREAM, which you create yourself in a lexer rule. You are responisble for closing the stream when everythign (including the tokens created from the stream) have finsihed with it, so you should track it somewhere. However, when the stream is exhausted, the standard nextToken() implementation will automatically switch back to the previously saved stream (if any), until all streams are exhausted.

Include Files in the Parser (Java target)

The method shown above implements include files at the lexer level. This can lead to uses not thought of, such as the following (which is legal in C, and would be legal if implemented as above):

main.c:
   if (
   #include "halfcondition.c"
   i > 0) { ... }

halfcondition.c:
   i < N &&

If one wants to prohibit such uses, is is very easy to implement include files at the parser level. Upon encountering an include statement, one starts off a new lexer and parser, and "hangs" the resulting AST "into" the current, i.e. uses it as the return of the include rule.
This example is for a C-like language and uses a grammar 'gram' with starting rule 'program'; change this according to your needs. It assumes your token stream is 'CommonTokenStream', and your tree type is 'CommonTree', which is the default.

include_filename :
  ('a'..'z' | 'A'..'Z' | '.' | '_')+
;

include_statement
@init { CommonTree includetree = null; }
 :
  'include' include_filename ';' {
    try {
      CharStream inputstream = null;
      inputstream = new ANTLRFileStream($include_filename.text);
      gramLexer innerlexer = new gramLexer(inputstream);
      gramParser innerparser = new gramParser(new CommonTokenStream(innerlexer));
      includetree = (CommonTree)(innerparser.program().getTree());
    } catch (Exception fnf) {
      ;
    }
  }
  -> ^('include' include_filename ^({includetree}))
;

This will create an 'include' node in the AST which has as first child the name of the included file, and as second child the AST resulting from the parsing of the include file. The code works recursivly, i.e. the included file can have its own include statements and so on. In production code, real exception handling should be done.

Thanks to Harald Müller for suggestion and comments!