/
FAQ - C Target

FAQ - C Target

How can Antlr Parser actions know the file names and line numbers from a C preprocessed file?

http://antlr.markmail.org/search/?q=How+can+Antlr+Parser+actions+know+the+file+names+and+line+numbers+from+a+C+preprocessed+file%3F#query:How%20can%20Antlr%20Parser%20actions%20know%20the%20file%20names%20and%20line%20numbers%20from%20a%20C%20preprocessed%20file%3F+page:1+mid:xf3ap74bgjnhwfz4+state:results

TerryArnold

One way is to follow this path:

  • Derive your own token from CommonToken and add file number field to it;
  • Get the lexer to produce those tokens and the parser to accept them;
  • Build a file table in the lexer and refer to it in error messages;
  • Keep track of current file in the lexer in case you need error messages from thelexer;
  • Keep track of current line number in the inferred file by setting it in your PPLINE rule then incrementing it in your NEWLINE rule;
  • Set the line and file number at the end of each rule (or override the nextToken stuff to set this automatically);

Start down that path and you will see the best way for your requirements.

In the C target there are user fields for storing such additional information so you can do the same thing as deriving a token.

AST

What is the ASTLabelType to be specified for tree grammar in C target?

http://antlr.markmail.org/search/?q=Problem+with+AST+type+in+tree+grammar+in+C+target#query:Problem%20with%20AST%20type%20in%20tree%20grammar%20in%20C%20target+page:1+mid:clysmfsuwfn623ku+state:results

KevinCummings

Use following:

options
{
	ASTLabelType = pANTLR3_BASE_TREE;
	language = C;
}

How to skip a subtree with the C target of ANTLR v3.4?

http://antlr.markmail.org/search/?q=%5BC%5D+Skip+sub-tree+nodes+from+AST%3F#query:%5BC%5D%20Skip%20sub-tree%20nodes%20from%20AST%3F+page:1+mid:p5lzplrfhpkaxejj+state:results

Requirement: Implement an if/then/else interpreter. Parse only either the "then" or the "else" statement skipping the other one and going to the end of the if statement after having handled it.

GonzagueReydet

Sample grammar snippet: Custom function that walk the subtree counting the UP & DOWN tokens and ignoring all other tokens.

// ^(IF expression statement (ELSE statement)?)
if_
    @declarations { ANTLR3_MARKER thenIdx, elseIdx = 0; }
  : ^(IF expression
      {
        thenIdx = INDEX();
        ignoreSubTree(ctx);
        if (LA(1) == ELSE) {
            MATCHT(ELSE, NULL);
            elseIdx = INDEX();
            ignoreSubTree(ctx);
        }
      })
    { // My code that rewind to either then or else block }
  ;

and the ignoreSubTree() function is implemented as following:

static void ignoreSubTree(psshell_tree ctx)
    {
        ebd_sint32_t nestedS32 = 0;

        do {
            MATCHANYT();
            if  (HASEXCEPTION()) {
                return;
            }
            switch(LA(1)) {
                case DOWN:
                    nestedS32++;
                    break;
                case UP:
                    nestedS32--;
                    break;
                default:
                    break;
            }
        } while (nestedS32 > 0);
        MATCHANYT(); // Eat last UP token
    }

How to get rid of bug in AST walking while implementing control flow in pre-ANTLR 3.2.1?

http://antlr.markmail.org/search/?q=Bug+in+AST+walking+%2C+implementing+control+flow#query:Bug%20in%20AST%20walking%20%2C%20implementing%20control%20flow+page:1+mid:yhtis4lcwhumlg5o+state:results

sample:

In an AST walker grammar following this example : http://www.antlr.org/wiki/display/CS652/Tree-based+interpreters

ifstat
    :   ^('if' c=expr s=. e=.?) // ^('if' expr stat stat?)

In the generated code , variables e,s are not defined! (yet they are assigned values ) and the compiler complains about that , but when they are defined by hand (under the definition of variable c ) the code compiles fine.

MohamedYousef

Till the bug is fixed (>= 3.2.1), follow these guidelines

1. use something like

ifstat
    :   ^('if' c=expr s=. e=.)
{

pANTLR3_BASE_TREE  s=(pANTLR3_BASE_TREE)LT(1);
pANTLR3_BASE_TREE  e=(pANTLR3_BASE_TREE)LT(1);

You would not have the s=. and e=., just code to consume what you need.

//do whatever with return values or custom nodes in e&s
}

2. For index and related properties, use the method calls rather than the fields directly, but if you know you will never override the structure types then you only have to worry if the names of the fields are changed.

You probably need:

pANTLR3_COMMON_TREE s =
=(pANTLR3_COMMON_TREE)(((pANTLR3_BASE_TREE)LT(1))->super);

3. Nodes are valid between tree walks and rewrites so long as you do not free the node streams until you are completely done. You can dup a node outside the factory and then it will persist, but you need to free the memory.

Create your own 'class'/structure and populate it from the information in the node. You will probably need to reset the node stream etc too.

How to navigate the AST built using the ANTLR3 C target (how to access token information stored in the tree nodes: ANTLR3_BASE_TREE or ANTLR3_COMMON_TREE)?

http://antlr.markmail.org/search/?q=Navigate+the+C+target+AST#query:Navigate%20the%20C%20target%20AST+page:1+mid:bpchlmrisewkhgme+state:results

RichardConnon

Use the super pointer in the ANTLR3_BASE_TREE and cast it to pANTLR3_COMMON_TREE. The getToken returns the BASE_TOKEN, which has a super, which is casy to COMMON_TOKEN.

Check: makeDot in basetreeadaptor for some examples. Check that function in the source, but this time do not use generating options. Get the function to work for you.

nodes = antlr3CommonTreeNodeStreamNewTree(psrReturn.tree, ANTLR3_SIZE_HINT);
dotSpec = nodes->adaptor->makeDot(nodes->adaptor, psrReturn.tree);

Also, look at the code for the adaptor.

You can only get tokens for nodes that have them of course.

How to create my own custom AST instead of the pANTLR_BASE_TREE provided in the C runtime?

http://antlr.markmail.org/search/?q=custom+AST+with+re-write+rules+in+C+runtime#query:custom%20AST%20with%20re-write%20rules%20in%20C%20runtime+page:1+mid:p5n6pxnoxwrufncs+state:results

(I've been looking at antlr3basetree.h. Is the idea that we use this base tree, but set the "u" pointer to our own data
structures?
)

RobertSoule

That's by far the easiest way, which is why I did that. Mostly, that's all people need. However, you can also write your own tree adaptor that just passes around (void *). Not a task for the inexperienced C programmer though. But if you copy the adaptor code and fill in your own functions, that will work. Much easier to just add your nodes to the u pointer though and let the current stuff do all the work (smile)


I'd like to use my own custom AST and write my own tree adapter. One thing that I haven't been able to figure out is how to substitute my adapter for the default adapter when the parser code is generated.

The adapter is declared in the parser header file as:

pANTLR3_BASE_TREE_ADAPTOR adaptor;

and then used with the macro definition:

#define ADAPTOR ctx->adaptor

What I was wondering is, is there an option that I can specify to have antlr generate a parser that refers to my adapter?

pMY_CUSTOM_TREE_ADAPTOR adaptor;

and the associated functions like:

ANTLR3_TREE_ADAPTORNew(..) ?

Answer:

Once you have created the parser, just install a pointer to your own adaptor. You can also do this in @apifuncs I think. As in:

ADAPTOR = pointerToMyAdaptor;

You adaptor structure needs to be the same structure that the common adaptor uses and provide the same function set.

To be honest with you, even in Java I have found it easier just to call my own functions directly from actions (i.e. to use the actions instead of the re-write rules) and not try and create an adaptor, though this is essentially because of what type of tree is needed in the case of errors and so on. This does end up tying your grammar in to a specific generated code.

The adaptor will work, but will possibly be more work. I am not sure if you might not have to be tricky with the rewrite stream functions.

Code Generation

How to generate C and Java from a single ANTLR grammar file that contains actions; say the grammar looks something like this?

http://antlr.markmail.org/search/?q=Can+I+target+C+and+Java+from+one+grammar+file#query:Can%20I%20target%20C%20and%20Java%20from%20one%20grammar%20file+page:1+mid:s6gs543v2iylqacv+state:results

selectStatement[int initRule]
//IFDEF JAVA
@init 	{if(initRule) sse.pushCall(sse.SELECTSTAT);}
//ELIFDEF CPP
@init 	{if(initRule) sse->pushCall(sse.SELECTSTAT);}
//END
	:
	q = queryExpression[true]
	;

AndyGrove

Using a preprocessor for ANTLR grammar avoids having to maintain two versions of the grammar.
Following preprocessor for ANTLR grammar contributed by Andy Grove of CodeFutures Corporation:

#!/usr/bin/ruby
# Preprocessor for ANTLR grammar files with multiple language targets
# Written by Andy Grove on 23-Jan-2009

def preprocess(filename, userTarget)
   f = File.open(filename)
   include = true
   currentTarget = "*"
   f.each_line {|line|
     if line[0,7] == '//ifdef'
       currentTarget = line[7,line.length].strip
     elsif line[0,9] == '//elifdef'
       currentTarget = line[9,line.length].strip
     elsif line[0,7] == '//endif'
       currentTarget = "*"
     else
       if currentTarget=="*" || currentTarget==userTarget
         puts line
       end
     end
   }
   f.close
end

begin
   if ARGV.length < 2
       puts "Usage: preprocess filename target"
   else
       preprocess(ARGV[0], ARGV[1])
   end
end

How to compile g3pl file to generate C code for 64 bit target?

http://antlr.markmail.org/search/?q=generating+C+code+from+g3pl+file+for+64+bit+linux#query:generating%20C%20code%20from%20g3pl%20file%20for%2064%20bit%20linux+page:1+mid:drj2kck6xm6jon46+state:results

KrishnaVenuturimilli

  1. Rename it to a .g file, the extension names were because older versions of Visual Studio needed it to see what the output files were but that isn't relevant any more.
  2. The generated C is the same on all platforms and there is nothing special to do to generate 64 bit, 32 bit, Linux, Win32, Solaris etc. In fact it is designed so that you can generate the C on one platform and compile it on any.

So, just like any other file you run the ANTLR tool on it and it will give you a .c and .h file. The generated files are both 32 and 64 bit compatible but if you read the docs for the C runtime it will point you at ./configure --help where you will see a flag that you supply to build the libraries in 64 bit mode. Set the enable-64bit flag when you build the runtime.

How to make the generated *.c-file to be named *.cpp?

http://antlr.markmail.org/search/?q=C-Runtime%2C+*.c-file+should+be+*.cpp#query:C-Runtime%2C%20*.c-file%20should%20be%20*.cpp+page:1+mid:cekewbnm6lhk5x5n+state:results

UdoWeik

  1. Change the CTarget.java and rebuild the tool.
  2. But it is much easier to just add the "Compile as C++" flag to the compiler:
    1. MS: /TP
    2. gcc: -x cpp
  3. Or you could trivially add a makefile rule to rename them after the antlr tool is run.

How to get rid of type conversion errors of form shown below while compiling generated code using ANTLR C runtime (version 3.1.1)?

MyLexer.c:1634: error: invalid conversion from 'int' to 'const ANTLR3_INT32*'

http://antlr.markmail.org/search/?q=Error+compiling+generated+C+code+%28possibly+32%2F64+bit+conflict%3F%29#query:Error%20compiling%20generated%20C%20code%20(possibly%2032%2F64%20bit%20conflict%3F)+page:1+mid:oqweqilhaocb7r7u+state:results

AndyGrove

  • If trying to compile as C++ code, use -I/usr/local/wherantlris and run 'make install'
  • Redefine some of your token names such that they do not clash with standard C/C++ macros. Specifically rename your NULL and DELETE tokens as KNULL and KDELETE. Fix these and try simpler compiles.

As a design pattern though, create a helper class and just call that, keep the code you embed in actions to a minimum for maintenance reasons. Try a simpler compile first, just as C:
gcc -I /usr/local/include -c src/sqlparser/DbsMySQL_CPPLexer.c

Then the same with g++

g++ -I /usr/local/include -c src/sqlparser/DbsMySQL_CPPLexer.c

If you turn on all the possible C++ warnings and errors, then it is likely you will get a lot as this is meant to compile as -Wall with gcc, but I make no claims for g++ (wink) You can compile your own C++ and so on with all the C++ warnings of course.

How to get rid of error in generated C code where a struct is referenced without being initialized?

http://antlr.markmail.org/search/?q=Error+in+generated+C+code+%28struct+referenced+without+being+initialized%29#query:Error%20in%20generated%20C%20code%20(struct%20referenced%20without%20being%20initialized)+page:1+mid:nw3fwpj5wx2kssbu+state:results

sample with the problem:

grammar:

e = expression 
 ( a1 = alias1 )? 
 { sse.addSelectItem($e.text, $a1.text); }

In generated parser, a variable "a1" is declared thus:

 DbsMySQL_CPPParser_alias1_return a1;

Under certain conditions the variable gets initialized by calling a method:

a1=alias(ctx);

However, when parsing input, the variable is not getting initialized but it is getting referenced in the following expression, causing a seg fault:

(a1.start != NULL ? STRSTREAM->toStringTT(STRSTREAM, a1.start, a1.stop) :
NULL )

Because a1 is never initialized, a1.start refers to non null value. If a1.start = NULL is added after a1 is declared then the seg fault is fixed. How to fix this?

AndyGrove

 

| e = expression
        ( a1 = alias1
             { sse.addSelectItem($e.text, $a1.text); }
        |     { sse.addSelectItem($e.text, NULL); }
       )

 

How to do custom code in parser which needs to access the data structure written in C++ (basically class)?

http://antlr.markmail.org/search/?q=how+to+generate+C%2B%2B+file#query:how%20to%20generate%20C%2B%2B%20file+page:1+mid:oftjxevdxgdwjlmw+state:results

say for eg.

s  	→ CHAR '=' e
e  	→ t y 
y 	→ '+' t y 
	→ 
t 	→ p x
x 	→ '*' t 
	→ 
p	→ '('e')' 
	→ NUMBER 

where CHAR and NUMBER are tokens.

fragment
LETTER 	:	 ('a'..'z' | 'A'..'Z')
	;

CHAR	:	LETTER (LETTER | DIGIT | '_')+
	;

fragment
DIGIT	    : '0'..'9'
            ;

NUMBER	    : (DIGIT)+ '.' (DIGIT)+ | (DIGIT)+ 
            ;

Let say the input is i=5

Now in
s → CHAR '=' e, When this get executed for i depending upon the Right hand side I need to declare the 'i' as int. I will get what type of datatype by calling some member funtion of some class. Is it possible to do like this ANTLR.

J.R Karthikeyan

Download the example tar from the download page. It shows you how to do this.

------------------

Generated C code using ANTLR: Got this error in VS 2005 :

Error 1 fatal error
C1010: unexpected end of file while looking for precompiled header. Did you forget to add '#include "stdafx.h"' to your source? c:\documents and settings\kjambura\my documents\visual studio 2005\projects\wrapper\wrapper\checkforcompileparser.c 468

Answer:

Problem was fixed by:

Compiler errors are coming in VS 2005 because the default options applied to a VS C++ project include pre-compiled header support.

The easiest way around this is to tell VS that your Antlr generated files don't use the pre-compiled headers. To do that, select the Antlr source files in the Solution Explorer, then right-click and select Properties. After that, select the C++->Precompiled Headers property page and then in the 'Create/Use Precompiled Header" property, select the option that says something like "Not using precompiled headers".

But Visual studio 2005 is no longer directly supported.

Clean solution is : Visual Studio 2008 is available in a free version (and probably 2010 ), so should really use that. The issue is that the vs2005 compiler does not support a few ANSI constructs used int eh runtime. This means that you must compile the runtime in 2008 and just link with it in your 2005 project. But unless you configure the include paths for you project and so on, then it will not compile anyway. That's why should download vs2008 and the example projects, then use the example projects to show you how to configure your own project.

Why does C target code generation of ANTLR 3.4 not set all the rule variables to NULL?

http://antlr.markmail.org/search/?q=C+Target+code+generation+of+ANTLR+3.4+does+not+set+ALL+the+rule+variables+to+NULL#query:C%20Target%20code%20generation%20of%20ANTLR%203.4%20does%20not%20set%20ALL%20the%20rule%20variables%20to%20NULL+page:1+mid:ohlc6e4mqdd7mfwt+state:results

AdrianPop

It is by design, to prevent trying to assign NULL to things that are not pointers.

Why am I getting error "set complement is empty"?

http://antlr.markmail.org/search/?q=Empty+complement+set%3F#query:Empty%20complement%20set%3F+page:1+mid:d3uzu3clpu3urmai+state:results

The following grammar does not generate target code. It says "error(139): skipper.g: set complement is empty".

grammar skipper;

options
{
     language = C;
}

skipper		
     @init {
	  int braceCount = 1;
     }
     : (
     '('
     {
	  braceCount ++;
     }
     | ')'
     {
	  braceCount --;
	  if(braceCount == 0)
	  {
		LTOKEN = EOF_TOKEN;
	  }
     }
     | ~('('|')')
     ) *
     ;

What's wrong with it?

Anton Bychkov

Answer 1 Thread:

Try adding the lexer rules:

LParen : '(';
RParen : ')';

              Antlrwoks reports the same error on 28 line which is

                             | ~(LParen|RParen)

            There is also a strange thing in rule view, it looks like antlr does not see LParen and RParen in twiddle operator.

 

There are no other tokens than '(' and ')' defined, so ~(LParen|RParen) is wrong. Try adding a "fall through" DOT in your lexer grammar:

skipper
       @init {
               int braceCount = 1;
       }
       : (
       LParen
       {
               braceCount ++;
       }
       | RParen
       {
               braceCount --;
               if(braceCount == 0)
               {
                       LTOKEN = EOF_TOKEN;
               }
       }
       | Other
       ) *
       ;


LParen : '(' ;
RParen : ')' ;
Other  :  .  ;

Or like this:

LParen : '(';
RParen : ')';
Other  : ~(LParen | RParen);

Answer 2: 

You cannot use set complements in parser rules. That is for lexer rules only. In the next release, ANTLR will tell you about this. But don't use 'literals' while you are learning as it is too easy to get confused as to what they mean in terms of lexer vs parser.

Debugging and Error Checking

How to debug grammars for the C-Runtime using ANTLRWorks?

http://antlr.markmail.org/search/?q=C-Runtime+and+ANTLRWorks+-+debugging+grammars#query:C-Runtime%20and%20ANTLRWorks%20-%20debugging%20grammars+page:1+mid:m4s4d24baujzod4v+state:results

UdoWeik

  • Generate with -debug option
  • Compile as normal
  • When you run the parser, it will appear to hang
  • Use ANTLRWORKS to load the grammar file
  • 'Debug remote' to localhost

Debugging parser written for C target:

http://antlr.markmail.org/search/?q=Remote+Debug+C+Target+parser#query:Remote%20Debug%20C%20Target%20parser+page:1+mid:gt4ewht2mii5jyoy+state:results

Problem:

Since I can not debug it in ANTLworks directly, I wrote a small test app that includes the parser and parses one line:

http://pastebin.com/WzXrvTxr

The parser and lexer have been compiled with the "-debug" option. I can connect to the port in ANTLworks, but as soon as I click "step forward" in the debugger, my program finishes and the debugger doesn't display anything.

This is the console output I get:

[...]
unknown debug event: location 796 5
unknown debug event: enterRule PLSQL3c.g select_expression
unknown debug event: exitSubRule 153
unknown debug event: location 646 5
unknown debug event: enterRule PLSQL3c.g select_statement
unknown debug event: location 640 5
unknown debug event: enterRule PLSQL3c.g select_command
unknown debug event: location 583 5
unknown debug event: enterRule PLSQL3c.g to_modify_data
unknown debug event: location 574 5
unknown debug event: enterRule PLSQL3c.g sql_command
unknown debug event: location 569 5
unknown debug event: enterRule PLSQL3c.g sql_statement
unknown debug event: location 147 5
unknown debug event: enterRule PLSQL3c.g statement
unknown debug event: location 126 14
unknown debug event: enterSubRule 20
unknown debug event: enterDecision 20
unknown debug event: LT 1 9 4 0 1 22 ";
unknown debug event: LT 2 0 -1 0 1 -1 "<EOF>
unknown debug event: exitDecision 20
unknown debug event: exitSubRule 20
unknown debug event: location 126 32
unknown debug event: enterSubRule 21
unknown debug event: enterDecision 21
unknown debug event: LT 1 9 4 0 1 22 ";
unknown debug event: exitDecision 21
unknown debug event: enterAlt 1
unknown debug event: location 126 32
unknown debug event: LT 1 9 4 0 1 22 ";
unknown debug event: consumeToken 9 4 0 1 22 ";
unknown debug event: exitSubRule 21
unknown debug event: location 126 38
unknown debug event: LT 1 0 -1 0 1 -1 "<EOF>
unknown debug event: consumeToken 0 -1 0 1 -1 "<EOF>
unknown debug event: location 127 2
unknown debug event: enterRule PLSQL3c.g parse_statements
java.net.SocketException: Connection reset
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at org.antlr.runtime.debug.RemoteDebugEventSocketListener.eventHandler(R
emoteDebugEventSocketListener.java:178)
at org.antlr.runtime.debug.RemoteDebugEventSocketListener.run(RemoteDebu
gEventSocketListener.java:472)
at java.lang.Thread.run(Thread.java:619)

Andi Clemens

Guidelines:

You are better using the native debugger at that release point. I think that ANTLRWorks and the C runtime got out of sync. The current development branch is working if you want to use that, but it isn't tested very much of course.

How to handle HASEXCEPTION / HASFAILURE in following case:

http://antlr.markmail.org/search/?q=Handling+HASEXCEPTION+%2F+HASFAILURE#query:Handling%20HASEXCEPTION%20%2F%20HASFAILURE+page:1+mid:jbfloqzpmgqewrfk+state:results

Because the C version of ANTLR does not support @after actions, I have implemented the equivalent in my grammar by placing a block of code at the end of my rule. For example:

whereClause
@init { sse.pushCall(sse.WHERECLAUSE); }
      :
      (
      WHERE
	 c = searchCondition
      )
      { sse.popCall(); } <-- @after action
	;

I noticed that this @after action was not running for some of my input data, so I added debug logging to the generated parser code and found that one of the calls to the HASEXCEPTION macro is returning true and the parser is finishing at that point.

How am I supposed to detect if an exception like this has occurred and how do I find out what is causing the exception?

AndyGrove

The calling rule will make a call to the error message routines so check your stack there. But really you should parse, build a tree then do this in the tree walk and you won't have the issue. Read the API docs for C and displayRecognitionError.

How to write parser to check for correct syntax as well as parse erroneous input to do autocompletion in another program?

http://antlr.markmail.org/search/?q=Parsing+erroneous+input#query:Parsing%20erroneous%20input+page:1+mid:tpsczv2vroarvn2i+state:results

AndreasHeck

You have to be careful how you implement your grammar rules such that you can recover sensibly from errors. Generally you build a tree or partial tree then analyze that. You may also need to specifically code for some potential missing elements, but again you have to be careful not to introduce ambiguities that break the normal grammar.

For hints on how to code rules that recover well from errors (especially in loops), see:

http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery

Why stream name can't be printed out when error occurs(ANTLR C)?

http://antlr.markmail.org/search/?q=Why+stream+name+can%27t+be+printed+out+when+error+occurs%28ANTLR+C%29%3F#query:Why%20stream%20name%20can%27t%20be%20printed%20out%20when%20error%20occurs(ANTLR%20C)%3F+page:1+mid:p3c4a5ft2x5lw7sg+state:results

I set the fileName for ANTLR3_INPUT_STREAM and the file name can be printed out when error occurs. But when the tree parser is involved, the name can't be always printed out (print unknown-source instead sometimes).

Mu Qiao

There can be multiple streams and you should reallty be using the file name in the filestream, which you get from the token. Originally, we could not get back to the filestream from the token and now we can, hence the example error code uses what was available when I wrote it and also avoids (at the time) dealing with stringstreams that might not have a name (though now they always do of course).

How to debug crash in dupNode in antlr3commontree.c in libantlr3c 3.1.3.

http://antlr.markmail.org/search/?q=C+runtime-+crash+in+dupNode%3B+UP%2FDOWN%2Fetc.+missing+factory%3F#query:C%20runtime-%20crash%20in%20dupNode%3B%20UP%2FDOWN%2Fetc.%20missing%20factory%3F+page:1+mid:3n3h62sgbopmxgg5+state:results

(dupNode is being called on a common tree node which has a null factory.

Edited stack trace from GDB:

dupNode (tree=0x97debe0)
at antlr3commontree.c:402
getMissingSymbol (recognizer=0x97ec8f8, istream=0x97ea300, e=0x97ecba8,
expectedTokenType=23, follow=0x81beca0)
at antlr3treeparser.c:227
recoverFromMismatchedToken (recognizer=0x97ec8f8, ttype=23,
follow=0x81beca0)
at antlr3baserecognizer.c:1530
match (recognizer=0x97ec8f8, ttype=23, follow=0x81beca0)
at antlr3baserecognizer.c:478
...

I believe the node it's trying to duplicate is one of stream->UP/DOWN/EOF_NODE/INVALID_NODE initialised in antlr3CommonTreeNodeStreamNew (in antlr3commontreenodestream.c). It looks like those nodes never have their factory set.)

NedGill

This is in the tree parser's error message routines. That means your tree parser is incorrect, or at least the input it is receiving has a mismatch. Unfortunately, I think that the default error message routine doesn't handle duping the node for error display that well when it is trying to duplicate from LT(-1) and LT(-1) is an UP/DOWN token; this is of course a slight bug. However, you are really expected to install your own error message display routines of course, so your fix for the moment is to make a copy of the tree parser displayRecognitionError, have it check that the token being duped to create the missing symbol has a tree factory and if it does not, then use LT(-2) etc (check for start of stream of course) and install it before invoking the tree parser.

Now, you need to debug your AST and the walker of course. The best way to do that is to produce a .png graphic of the tree that you have. From a prior post:

(First install graphviz from your distro or 1www.graphviz.org)

To use it from C, you just do this:

 pANTLR3_STRING dotSpec;
 dotSpec = nodes->adaptor->makeDot(nodes->adaptor, psrReturn.tree);

Where nodes is the pANTLR3_COMMON_TREE_NODE_STREAM and psrReturn is the return type of the rule you invoke on your parser.

You can then fwrite the spec to a text file:

dotSpec = dotSpec->toUTF8(dotSpec); // Only need this if your input was not 8
 bit characters
 fwrite((const void *) dotSpec->chars, 1, (size_t) dotSpec->len, dotFILE);

Then turn this in to a neat png graphic like this:

sprintf(command, "dot -Tpng -o%spng %sdot", dotFname, dotFname);
 system(command);

You can then use this png to debug your AST and when it looks correct, make sure that your tree parser is expecting that input. One of the two is incorrect in your case and obviously tree parsers should not encounter incorrect input.

How to ignore parsing errors in ANTLR for C target?

http://antlr.markmail.org/search/?q=Tell+ANTLR+to+ignore+parsing+errors%3F#query:Tell%20ANTLR%20to%20ignore%20parsing%20errors%3F+page:1+mid:tewn3inxc7bwt7wr+state:results

(I just want to parse known Statements with my grammar, all unknown statements (parsing errors) could be ignored.

Can I tell ANTLR (for the C target) to ignore those error messages and just return FALSE or something like that, so that I can decide wether to take an appropriate action?)

AndiClemens

If you are getting errors it is because your grammar is incorrect. e.g. Your token in the parser (which you should move to the lexer anyway and not use 'LITERAL' in your parser code) is CREATEE but your input is create. Did you tell the runtime to be case insensitive?

Read the API or use antlr.markmail.org to see how to override displayRecognitionError(). You cannot just ignore errors though because somehow you have to recover. You could just make them silent and when the parser returns if the error count is >0 then ignore that source or something.

( What is the difference if I add "CREATE" or similar to the lexer? Is it more reliable in detecting the right tokens? )

When putting things in the parser, you have not enough control over the tokens both in terms of what they are named in code generation time (hence error messages are difficult, and producing a tree parser is difficult), and you cannot see the potential ambiguities in your lexer. It just makes things more
difficult for no(IMO) advantage.

If you have told the input stream to be case insensitive, then I am afraid that the problem is going to be with your grammar. You will have to single step though the code to find out why.

( Would it be better to have tokens like "CREATE USER" and "CREATE TABLE" in the lexer or doesn't this work anyway because of the whitespace?)

No - don’t make whitespace significant unless the language absolutely makes you do so.

What you have to do is left factor:

create
   : CREATE
   (
       cr_table
     | cr_user
     | cr_trigger
   )
 ;

cr_table
 : TABLE .....

I know how to install my own custom displayRecognitionError handler, but how to remember the errors to print out for later display.

http://antlr.markmail.org/search/?q=ANTLR3C+displayRecognitionError#query:ANTLR3C%20displayRecognitionError+page:1+mid:taakrcaujhqk7uy4+state:results

(Sure, I could put them in a global variable, but I'd like for multiple instances of my parser to be able to be called concurrently. Is there any way to add a data structure (like a list<string>) to the ANTLR3_BASE_RECOGNIZER class?)

MarkRosen

Yes, you can indeed do this (comes from using the code myself - I run in to the same things (smile) , but you do not add it to the . For some reason, the doxygen generated docs are not including the doc pages about this, I will have to find out why.

I generally use an ANTLR3_VECTOR (because then it is ordered), but tend to collect all such things (counters, parameters etc) in a mast structure like this:

@parser::context
{
        pMY_PARSE_SESSION              ps;                     // MY 
compiler session context for parser
}

Anything in the @lexer/@parser ::context section is adding verbatim into the context struct (referenced via ctx), and you can initialize it in @apifuncs, or externally.

The base recognizer has a void * pointer called super, which will point to the parser instance (you can see that the the default error display routines pick this up.). However, ANTLR3_PARSER also has this field, but it is not initialized by default because I cannot know that the instance of your generated parser is what you want it to point at (I suppose I could assume this and let you override it, but it is probably better to explicitly initialize it for future reference.

So, in your apifuncs section:

@parser::apifuncs {

  ctx->ps = myNewContextGrooveThang;

  PARSER->super = (void *)ctx;
}

Now, in your display routine, you will get the parser pointer from base recognizer, get the ctx pointer from super, and your custom, thread safe collection will be in your master structure. A few pointer chases, but this provides maximum flexibility:

displayRecognitionError        (pANTLR3_BASE_RECOGNIZER recognizer, 
pANTLR3_UINT8 * tokenNames)
{
   pANTLR3_PARSER            parser;
   pmyParser                             generated;
   pMY_PARSE_SESSION              ps;

   parser        = (pANTLR3_PARSER) (recognizer->super);
   generated     = (pmyParser)(generated->super);
   ps            = generated->ps;

   // Bob's your uncle.....

I know how to install my own custom displayRecognitionError handler, but for reporting the names of tokens is there a way that I can show the matching string for simple tokens rather than the name, (possibly limited to just those declared with "tokens {...}")?

http://antlr.markmail.org/search/?q=Error+reporting+with+the+C+runtime-+tokenNames#query:Error%20reporting%20with%20the%20C%20runtime-%20tokenNames+page:1+mid:xkuq3qklhaqq4ftq+state:results

NedGill

  • It is the same as the other targets, in that you need to create a local function that returns/displays/adds to the message, the name you want to use for error display. It is just a switch statement on the token type basically, or you could create a local map and initialize it the first time it is required. It is just a bit of slog really. Java provides a method to override to do this, but in C, you just call your own local function.
  • I think the information I need is in the <Name>.tokens file, though, so possibly I could generate some code from that.

How to remove errors generated by using doxygen and the code generated by ANTLR using the C target?

http://antlr.markmail.org/search/?q=Doxygen+errors+when+using+the+C+Target+with+ANTLR#query:Doxygen%20errors%20when%20using%20the%20C%20Target%20with%20ANTLR+page:1+mid:ony5e2kk4h6o4bjk+state:results

HeikoFolkerts

Well to be honest I started adding doxygen to the generated code but after looking at what you get from it I decided that it wasn't really of much help. In the next version of ANTLR doc comments of rules will be passed through to code gen and that should help.

To change the template you need to either rebuild ANTLR or set your class path up so that it finds your version of C.stg before mine. I suspect though that what you will get is not really that useful. Better to document the grammar than the generated code.

Why is the input string "abc" on following grammar generating a NoViableAltException but using the C Runtime to parse a 'root' it passes successfully:

http://antlr.markmail.org/search/?q=ANTLR+%2B+C+Target+Questions#query:ANTLR%20%2B%20C%20Target%20Questions+page:1+mid:hei4jogrrkzzuu2n+state:results

 grammar schema;

 options
 {
      language = C;
 }

 root : letter* ;
 letter : A | B ;
 other : C;

 A   :    'a';
 B   :    'b';
 C   :    'c';

MichaelCoupland

root: letter* EOF;

No exceptions in C so that top rule can only set flags.

--------------

What would I want to look at if I wanted to deduce what portion of the input data the parser had consumed?

Answer:

Well, the EOF forces the parse to progress to the EOF 'token', so the parser will look at everything, perform recovery resync and go on until it sees the EOF, when it will finally breathe a sigh of relief (wink) . You override the error display (or a function earlier in that stack) and collect the errors in a form that you can then use after parsing. Generally, you don't print errors as they occur but collect them in a log/buffer/collection/kitchen sink and then decide where they should go after the parse is complete (I.E send to IDE, print to screen, email them and so on).

Understanding displayRecognitionError

http://antlr.markmail.org/search/?q=problem+in+displayRecognitionError%28%29+in+antlr2baserecognizer.c#query:problem%20in%20displayRecognitionError()%20in%20antlr2baserecognizer.c+page:1+mid:whrtnhb42jqi2w52+state:results

Question 1. The following is what I find out while assigning my own error processing function to antlr's recognizer's reportError function pointer:

For my grammar definition like:

Query: Rule1 Rule2? Rule3? ; EOF

And for my test case that has a malformed syntax, which matches Rule1, but the rest doesn't match either Rule2 or Rule3, through debugging, I find that the parser then tries to match with ";", of course it doesn't match ";" again, then it tries to report error and recover.

In the antlr's error reporting (I did it by copying the content of displayRecognitionError() to my own error processing function), the "expecting" variable should point to the position of ";" as that's what it tries to match. But it shows that the expecting is a very large number]

Xie, Linlin

The routines in the runtime are just examples. You are expected to make a copy of them and install your own routines. Note that the expecting member of the token is not always a valid token number. For instance you have to check it for being -1, which means it was expecting EOF, which is probably what it is expecting in your grammar above right?

if    (ex->expecting == ANTLR3_TOKEN_EOF)
{

Also, for some exception types, expecting is not a valid field. The default routines should show you that as not all exceptions access expecting. Finally, for some exceptions, you don't get a token number, but a bit set, from which you must extract the set of tokens that coudl have matched. This is a possibility here because your possibilities are start(Rule2) & start(Rule3) & ';'. In the code that processes ANTLR3_MISMATCHED_SET_EXCEPTION, you can see how to deal with the bit set.

So, if your expecting code is not -1, then it is a bitmap in all likelihood.

Question 2. I find that the exception variable has a nextexception pointer which points to another exception variable.

Answer:

The nextException isn't used by the default error reporting, I just included it in case anyone thought it useful.

Question 3: I would think start(Rule2), start(Rule3) and ; all should be the expected tokens, instead of EOF. Do you think if there is anything antlr can do to improve the error messages to make them more relevant? Or should I improve my grammar to get more appropriate error messages, and how?

Answer:

You have to write your own message display routines that make sense with your grammar. The default ones do check for EOF though. Your issue is that because all the things leading up to EOF are optional, ANTLR assumes that they are just not present:

Say start(rule2) is FOO and start(rule3) is BAR.

Then after rule1 it says:

No FOO is there, so go past Rule2, it isn't present
No BAR is there so go past Rule3, it isn't present

Now, what is the start set that can come next? Only EOF, so match EOF - oh it failed, so the expecting token is -1 for EOF.

However, if you do this:

: rule1
   (  rule2
       (
          rule3 EOF
        | EOF
       )
     | rule3 EOF
     | EOF
   )
;

Now, after rule1 has parsed, the followset will be FOO | BAR|EOF so you will get the error straight away. After rule2 is parsed, followset will be BAR|EOF so you will get the error straight away, after rule3, only EOF is viable.

Also I can see when the displayRecognitionError() checks the recognizer type, it only considers either parser or tree parser, why is lexer not considered here?

  1. Lexers can only say: "Not expecting character 'y' here. and so antlr3lexer.c has its own handler. You should install your own handler remember?
  2. If your lexer is throwing errors, then it is broken really. It should be coded to cope with anything one way or another. However, sometimes that is difficult of course. You need to make sure that your lexer rules can terminate just about anywhere, but throw your own (descriptive error) about any missing pieces. Then you have a final lexer rule:

 

ANY : . { SKIP(); log error about unknown character being ignored.

What this does is then move all your error handling up to the parser, where you have better context. Similarly, you should move any errors that you can out the parser and in to the tree parser, where once again you have better context. The classic example is trying to code the number of parameters that any particular function can take. Don;t do that, accept any, including 0, then check for validity in your first tree walk.

I can see that a lexer error is considered a No Via Alt parser exception, but there is still lexer error report from antlr, where can I find the lexer error report code? Or how can I intercept the lexer error like I do with the parser error report?

Intercept the same way, install your own displayRecognitionError, but make it say "Internal compiler error - lexer rules bad (sad) all your base belong to us"

How to solve the problem of parser crashes whenever a scoped attribute is referenced outside of a scope context?

http://antlr.markmail.org/search/?q=%5BC+Target%5D%5B3.1.3%5D+Parser+crashes+when+looking+up+a+scoped+attribute+outside+of+a+scope+context.#query:%5BC%20Target%5D%5B3.1.3%5D%20Parser%20crashes%20when%20looking%20up%20a%20scoped%20attribute%20outside%20of%20a%20scope%20context.+page:1+mid:hb4ydj4cd2agjaf7+state:results

[This kind of situation arises when a rule B that is referencing a scoped
attribute defined in a rule A may match on its own without requiring a
previous match to A.]

_ Luca_

It is a bug in your grammar which you can solve in one of a number of ways:

  1. The best thing to do would be to move the scope up the rule hierarchy so that you cannot call this rule without a scope in place.
  2. Protect the code your self.
  3. Use a different rule when the same syntax is parsed

How to access the input char stream from lexer's displayRecognitionError(String[] tokenNames, RecognitionException e) to display the characters preceding and following the error?

GioeleBarabucci

Use e.input from the Exception, on a CommonToken you can use getInputStream() (sometimes the input stream for the token may not be the same as the exception, for instance if you handle include processing etc).

How segmentation fault in isNilNode() from libatnlr3c.so was solved for following attached files (Eval.g, Expr.g, input.txt) :

http://antlr.markmail.org/search/?q=Getting+segmentation+fault+in%09isNilNode%28%29+from+libatnlr3c.so#query:Getting%20segmentation%20fault%20in%09isNilNode()%20from%20libatnlr3c.so+page:1+mid:oxej2g4jf7xpsljc+state:results

[Test.cpp file attachment is missing; other attachments in markmail]

To see the error:

  1. Save all files in the same directory, cd to it and run "chmod +x compile"
  2. Open Expr.g in ANTLRWorks, generate code, then do the same with Eval.g
  3. Run (I'm using GNU C++ compiler): ./compile
  4. Run: ./Test input

_L. Rahyen _

Error was because in code, was translating from Java "newCommonTreeNodeStream" to "antlr3CommonTreeNodeStreamNew" instead of "antlr3CommonTreeNodeStreamNewTree"

After finishing processing, why is nonterminal rec->state->errors_count = 0? It has detected an error already.

http://antlr.markmail.org/search/?q=Jim%20Idle#query:Jim%20Idle%20from%20list%3Aorg.antlr.antlr-interest%20from%3A%22Jim%20Idle%22+page:60+mid:555vjm4adfbx6qu7+state:results

Юрушкин Михаил

I presume you mean errorCount? Is this from the lexer? I fixed a bug whereby the lexer would not count errors - fixed in release 3.2. However, lexers should really be written so they don't throw recognition errors, but detect anomalies and report them more directly (such as missing terminating quotes and illegal characters. FInally don;t use intenrals directly, use the api calls: getNumberOfSyntaxErrors.

How to skip to end of line on error?

http://antlr.markmail.org/search/?q=Jim%20Idle#query:Jim%20Idle%20from%20list%3Aorg.antlr.antlr-interest%20from%3A%22Jim%20Idle%22+page:67+mid:3fpcrzdwr3jxxfvw+state:results

I want to parse a file that consists of bibliographic entries.  Each entry is on one line (so each record ends with \n). If a record does not match, I just want to print an error message, and skip to the end of line and start again with the next record. If I understand chapter 10 correctly, then '\n' should be in the resynchronization set, and the parser will consume tokens until it finds one.

This isn't happening.  Once I get an error, the parser never recovers. I get a bunch of NoViableAlt exceptions.  I'm hoping someone can explain what I'm doing wrong.

Here is a sample input file.  The 1st and 3rd lines are ok, the 2nd line is an error.

 Name. "Title," Periodical, 2005, v41(3,Oct), 217-240.

Name. "Title," Periodical, 2005, v41(3,Oct), Article 2.

Name. "Title," Periodical, 2005, v41(3,Oct), 217-240.

 Here is the grammar:

grammar Periodical;

article_list
    :    (article NL)* article NL?
    ;

article
    :    a=authors PERIOD SPACE QUOTE t=title COMMA QUOTE SPACE j=journal
COMMA SPACE y=year COMMA SPACE v=volume COMMA SPACE p=pages PERIOD SPACE*
    ;

authors    :    (~QUOTE)+;

title    :    (~QUOTE)+;

journal    :    (LETTER|SPACE|COMMA|DASH)+;

volume    :    (LETTER|DIGIT)+
    |    (LETTER|DIGIT)+ '(' (LETTER|DIGIT|SLASH|COMMA)+ ')'
    ;

year    :    DIGIT DIGIT DIGIT DIGIT;

pages    :    DIGIT+ DASH DIGIT+;

PERIOD    :    '.';
QUOTE    :    '"';
COMMA    :    ',';
SPACE    :    ' ';
DIGIT    :    '0'..'9';
LETTER  :    ('a'..'z')|('A'..'Z');
DASH    :    '-';
SLASH    :    '/';
NL    :    '\r'? '\n';

Rick Schumeyer

Basically, you need to prevent the parsing loop from dropping all the way out of the current rule because it finds an error (in your case within the article rule.) You will also find this much easier if rather than trying to accommodate files without a terminating NL, you just always add an NL to the incoming input, then you will not need the trailing article NL? But can have (article NL)* EOF.

 So, when an error occurs in the article rule, it will drop out of that rule, but may not resync, so you want to force the resync to the NL when the article rule returns. This is pretty simple, but requires quite a bit of 'inside' knowledge of the ANTLR behavior. What you need to do is create a rule with just the epsilon (nothing) alt, and invoke it directly before the article call but more especially directly after it:

articleList

    : reSync  (article reSync NL)* EOF // Assuming that this is where EOF should
be
    ;

Next, in your resSync rule, you want to resync to the follow set that will now be on the stack, which is actually the same as the first set for the following rule (because ruleSync is empty). Here we know that the followSet will only be NL, so you could hard code that, but this is a generally good technique to know, so let's use it generically). If you don't really understand this, don't worry too much, you can just copy the code and empty rule and it will work:

 

reSync

@init

{

    syncToFirstSet(); // Consume tokens until LA(1) is in the followset at the
top of the followSet stack

}

: // Deliberately match nothing, but will be invoked anyway

;

Then in your superClass (best) or @members, implement the syncToFirstSet method:

    protected void syncToFirstSet ()

    {

        // Compute the followset that is in context where ever we are in the

        // rule chain/stack

        //

         BitSet follow = state.following[state._fsp];
//computeContextSensitiveRuleFOLLOW();

         syncToFirstSet (follow);

    }

    protected void syncToFirstSet (BitSet follow)

    {

        int mark = -1;

        try {

            mark = input.mark();

            // Consume all tokens in the stream until we find a member of the
follow
            // set, which means the next production should be guarenteed to be
happy.
            //

            while (! follow.member(input.LA(1)) ) {

                if  (input.LA(1) == Token.EOF) {

                    // Looks like we didn't find anything at all that can help
us here

                    // so we need to rewind to where we were and let normal
error handling

                    // bail out.

                    //

                    input.rewind();

                    mark = -1;

                    return;

                }

                input.consume();

            }

        } catch (Exception e) {

          // Just ignore any errors here, we will just let the recognizer

          // try to resync as normal - something must be very screwed.

          //

        }

        finally {

            // Always release the mark we took

            //

            if  (mark != -1) {

                input.release(mark);

            }

        }

    }

And that's it. Every time you mention reSync in a rule, it will resync the input to a member of the current followSet, which will be the first set of the rule that follows reSync in the current production and you will therefore not drop out of the parsing loop, but reenter your article rule. The first invocation is just in case there is junk before the first article starts (depending on how this rule is invoked, you may need to resync before the articleList rule).

Question 2) I would like to make one change: when an error is encountered, I just want to print something like "problem on line 45" (and then continue with the next line).

Answer:

 I did write this up at:

 http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery

 If you look in the sync java routine in the article - we are looking at making this generic and adding to ANTLR - you will see that there is a marker there for printing errors. If you set a flag to say you have printed an error, then just print an error the first time you consume a token, you can get the line number etc from the token that you consume. You could also add a string parameter to the reSync routine which is a template or format for your message so you can say what type of construct you were parsing at the time. And pass this to the Java routine.

Is there a way where in if there is a lexer error, antlr reports the error and exits without creating the parser for it ?

http://antlr.markmail.org/search/?q=Jim%20Idle#query:Jim%20Idle%20from%20list%3Aorg.antlr.antlr-interest%20from%3A%22Jim%20Idle%22+page:68+mid:fijthyozfcfrp5fo+state:results

for ex: In the calling program:

lex = LexerNew(input);
{
	fprintf(stderr, 
	exit(ANTLR3_ERR_NOMEM);
}
tokens = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, TOKENSOURCE(lex));
{
	fprintf(stderr, 
	exit(ANTLR3_ERR_NOMEM);
} 
//want the lexer to exit here only in case of a lexer error 

Meena Vinod

Invoke the lexer separately by asking for the first token. This will cause the tokens to be gathered (the lexer will run to completion). Then check if it issued any errors (you should probably override the error reporting mechanism) and stop if it did.

 However, you should code your lexer such that it does not give errors, at least not from the runtime. See lots of past posts about this on this list. Ideally, you want your process to go as far as it can without giving up such that your end user receives as much information as possible about what is wrong before having to run your parser again. See:

 [1]http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs

 for some concrete examples.

How to customize error handling for the lexer?

http://antlr.markmail.org/search/?q=Jim%20Idle#query:Jim%20Idle%20from%20list%3Aorg.antlr.antlr-interest%20from%3A%22Jim%20Idle%22+page:75+mid:l6vj2qqcf6v4mkvf+state:results

The java parser has a general notion of parser exceptions. The C runtime simulates this by checking some attributes in the base-recognizer structure thanks to the HASEXCEPTION macro. Anyway it happens that recovered lexer errors are not propagated to the parser so in this case an HASEXCEPTION check does not return true . Starting from a parser rule invocation, how to detect if one has an error in the lexer? Another way to formulate the same question is how to get a reference to the lexer starting from the invocation of a parser rule? How to customize error handling for the lexer?

Answer:

That isn't how it works.

First the lexer is called to create all the tokens, then the parser runs. However, you can call the lexer on it's own (see C examples) then just check  the error count member of the recognizer before invoking the parser.

However, you should not program a lexer that can raise errors. Always cover all the bases with at least a final rule: ANY : . {error ...}; and left factor the other rules so that they cannot error out on their own. Then you can leave  errors to the parser.

Error handling using parallel instances of a C-target parser

http://antlr.markmail.org/search/?q=Error+handling+using+parallel+instances+of+a+C-target+parser#query:Error%20handling%20using%20parallel%20instances%20of%20a%20C-target%20parser+page:1+mid:ize6pc37d42ammzh+state:results

I am working with a C-target parser, and I have multiple instances of the parser running in parallel.

Now I would like to stop the parser from printing error messages to stderr. Instead, I would like each instance of the parser to collect the error messages in a list of strings, so that the caller can access the complete list of error messages after the parser finished and decide what to do about them.

Johannes Goller

Use antlr.markmail.org and look for displayRecognitionError. Remember that if you have parallel threads, you will want the error collections to be thread instances, not global members. Therefore you add them as context members @apifuncs etc.

http://antlr.markmail.org/search/?q=apifuncs#

http://antlr.markmail.org/search/?q=displayRecognitionError+C

mismatchRecover()

http://markmail.org/message/7ti4neiehgryfgqp#query:+page:1+mid:7ti4neiehgryfgqp+state:results

I am trying to turn off single token insertion and deletion error recovery in my parser (C target). I found the following comment in antlr3baserecognizer.c above the match() function.

/// Match current input symbol against ttype.  Upon error, do one token

/// insertion or deletion if possible.  

/// To turn off single token insertion or deletion error

/// recovery, override mismatchRecover() and have it call

/// plain mismatch(), which does not recover.  Then any error

/// in a rule will cause an exception and immediate exit from

/// rule.  Rule would recover by resynchronizing to the set of

/// symbols that can follow rule ref.

This seems fairly straightforward at first glance, but then I discovered that there is no mismatchRecover() function to override. Digging through the code, I suspect that this function was renamed to recoverFromMismatchedToken(), but I cannot simply override it with mismatch() because their prototypes do not match.

void        * (*recoverFromMismatchedToken)   (struct
ANTLR3_BASE_RECOGNIZER_struct * recognizer,

                    ANTLR3_UINT32 ttype,

                    pANTLR3_BITSET_LIST
follow);

void   (*mismatch)       (struct
ANTLR3_BASE_RECOGNIZER_struct * recognizer,

                     ANTLR3_UINT32 ttype,

  pANTLR3_BITSET_LIST follow);

As you can see, one returns a void *, and the other returns void. What is the correct way to do this?

Justin Murray

It means install your own version of recoverFromMismatchedToken and basically don't consume or insert but reset any flags etc.

Why am I getting memory leaks during error recovery?

http://antlr.markmail.org/search/?q=Jim%20Idle#query:Jim%20Idle%20from%20list%3Aorg.antlr.antlr-interest%20from%3A%22Jim%20Idle%22+page:82+mid:mffbqwr6wyeguijb+state:results

I have the rule:

myRule[returns Type1 res] :
	rule1, rule2, rule3, rule4... ruleN { res = f($rule1, $rule2,..., ,
$ruleN) }
	;

it's all ok. BUT if ruleN will fire exception, rule1, rule2.. rule(N-1) subtrees will be forgotten!!! 

PS: I tried to use "catch" attribute .. but it needs errorCode. It's not good, because there're may be different errors.

yuru...@rambler.ru

It is because you are trying to do things while you parse - another reason to build a tree and THEN operate on the tree.

Catch does not need a type in the C target, you can just use:

r:
  Ddddddd
;
catch() { }

(assuming 3.2 of ANTLR).

The other thing you might do is break up your rule list so that exceptions in them do not drop out the whole rule, which is what happens in all targets unless you structure the rules a little. Break things down in to smaller units. The function call overhead (which may not even occur because of inlining) is very small in C.

File name, line number, column number etc.

How to get the string for ID in a grammar?

http://antlr.markmail.org/search/?q=Need+help+getting+char+*+string+from+an+ID#query:Need%20help%20getting%20char%20*%20string%20from%20an%20ID+page:1+mid:cont23ez3iaw43i3+state:results

JeffreyNewman

Use the commontoken fields to find the start and end of the text that represents the token and copy/print/etc directly from the input text.

Download the examples and look at the C parser in there.

When an error is detected during the tree parsing, how to print error with information on the original input tokens, not on the tree node which is not relevant for the end user?

http://antlr.markmail.org/search/?q=How+to+get+information+on+the+tokens+that+produced+a+tree+node%3F#query:How%20to%20get%20information%20on%20the%20tokens%20that%20produced%20a%20tree%20node%3F+page:1+mid:4nrizmucdizunmfi+state:results

GonzagueReydet

You get the start and end token from the node, then ask the start token for its information and the end token for its information, and then you have the complete span of a node. Beware of -> ^(NODE1 c u ^(NODE2 x)) as NODE2 won't get the span information when the rewrite is like that.

To get the line and character number in the input file, get the token position , then you get the tokens, and ask for the information from them.

How to get line number and position of characters in input string?

http://antlr.markmail.org/search/?q=C-Runtime%2C+position+of+elements#query:C-Runtime%2C%20position%20of%20elements+page:1+mid:hgqt7ej4b7omtafg+state:results

UdoWeik

Use ANTLR3_COMMON_TOKEN_struct.

How to get a pANTLR3_INPUT_STREAM from a std::string (or char* variable)?

http://antlr.markmail.org/search/?q=pANTLR3_INPUT_STREAM+from+char*#query:pANTLR3_INPUT_STREAM%20from%20char*+page:1+mid:j6zzqigcfpgm4ifa+state:results

tunc

Functions

 [1]pANTLR3_INPUT_STREAM [2]antlr3NewAsciiStringCopyStream ([3]pANTLR3_UINT8
                           inString, [4]ANTLR3_UINT32 size, [5]pANTLR3_UINT8
                           name)
                           Create an ASCII string stream as input to ANTLR 3,
                           copying the input string.
                ANTLR3_API [7]antlr3NewAsciiStringInPlaceStream
 [6]pANTLR3_INPUT_STREAM ([8]pANTLR3_UINT8 inString, [9]ANTLR3_UINT32 size,
                           [10]pANTLR3_UINT8 name)
                           Create an in-place ASCII string stream as input to
                           ANTLR 3.
                ANTLR3_API [12]antlr3NewUCS2StringInPlaceStream
 [11]pANTLR3_INPUT_STREAM ([13]pANTLR3_UINT16 inString, [14]ANTLR3_UINT32
                           size, [15]pANTLR3_UINT16 name)
                           Create an in-place UCS2 string stream as input to
                           ANTLR 3.

[16]http://www.antlr.org/api/C/index.html

How to write a pretty printer for input language, with C as target; using following approach:

someRule
returns [pANTLR3_STRING result]
@init {result = factory->newRaw(factory);}
: ^(SOMETOKEN anotherRule thirdRule)
{
  $result->append($result, "Start\n");
  $result->appendS($result, $anotherRule.result);
  factory->destroy(factory, $aotherRule.result);
  $result->appendS($result, $thirdRule.result);
  factory->destroy(factory, $thirdRule.result);
  $result->append($result, "\n\n");
}

Problems encountered:

  1. when factory->close(..) is called at the end of my program, I get a double free problem, and I don't see in the API where I can call remove on the string from the factory.
  2. However, more troubling is that when the return of one of the rules like anotherRule is composed of only small literal strings (like "()"), then calling destroy on the result sometimes frees too much memory, so that I cause problems for "thirdRule".
  3. Moreover, this just seems like an awkward way to build up my output string.

RobertSoule

  1. You should not be calling the factory->destroy. Just close the factory, and if you use the parsers factory, you don't need to do that. When you are done, just memcpy (or strdup) the chars pointer from the final string. All the other memory will be discarded for you. If you are just going to write out the result, then fwrite the chars pointer and close as normal - all memory will be freed for you.
  2. factory->close(factory); // Do this only if this is your own factory, which there is no need for really.
  3. With respect to point 2 in the question, you don't use destroy() like this, so next time you try to use it, you have already corrupted the memory, so that causes problems for "thirdRule".
  4. With respect to point 3 in the question, that is because you are assuming that you have to do all the management. You don't do that, you just let the factory take care of it all. When you close, it has tracked all the memory and it frees it all for you. You just use the strings and forget about them as if they were Java objects. Use the factory in the parser (see C examples for poly for instance), and you don't even need to close your own factory.
  5. You do not need to use the string factory stuff. It is really just a convenience. You can copy the input text yourself using the token supplied offsets.

How can I access file, line and column information for tokens used by the tree parser (ANTLR 3.1)?

http://antlr.markmail.org/search/?q=Retrieving+Line+and+Column+information+in+AST#query:Retrieving%20Line%20and%20Column%20information%20in%20AST+page:1+mid:zbs2nbjxdli67fxn+state:results

[ In the following example, how can I access the location of the 'for' token used in the tree grammar to pass to the AppendOp() function?

Similarly, I'd like to add debug (file+line+column) info for all generated objects.

// in lexer
statement:
   :    'for' '(' start=expression? ';' cond=expression? ';'
next=expression? ')' body=statement
     -> ^('for' $cond $next $body $start )
  ...
  ;

// in tree
statement
    :    ^('for' cond=continuation step=continuation body=continuation
expression)    { AppendOp(Operation::ForLoop); }
    ...
    ;

Christian Schladetsch

1. It is the same as the other targets. Get a reference to the node, then the token that the node holds, or the token spans that the node represents (so you can use the start position of the start token and the end position of the end token for semantic errors say). To get the file name, ask the token what its input stream was, then ask the input stream what it its input is called.

Just be careful about the imaginary tokens as you can use their start and stop spans, but they were not generated from input, so the nodes themselves are imaginary.

However, you are using 'literals' in your parser and tree parser and this will make it very difficult for you to identify tokens unless you already have a context for them. You are well advised to replace these literals with 'real' tokens defined in the lexer, after which you can switch() on the token type and so on:

FOR: 'for';
SEMI: ';';

: FOR start=expression SEMI  ....etc

With good token names, the grammar will be no less readable. Finally, one tip is to pass the tree nodes around within your own routines, rather than the tokens they contain,span and so on. Finally, you have 3 int fields (user1, user2, user3) and a void * field (userp) available in a token, into which you can place anything you like at any point, because overriding token types and so on is a bit of a pain in C if you don't already know how to do it.

2. For getting filename, in the C target especially, switching input streams in the lexer is allowed, hence we added the ability for CommonToken to tell you what its input stream was.

Line and column can be taken as attributes 'line' and 'pos' from tokens though:

^(forTok = 'for' cond=continuation step=continuation body=continuation
      { line = $forTok.line;
        column = $forTok.pos;}

It may not be enough though when using error messages or other processing.

How to get line and column numbers from a CommonTreeNodeStream to use it to print error messages?

http://antlr.markmail.org/search/?q=Getting+line+and+column+numbers+from+a+CommonTreeNodeStream#query:Getting%20line%20and%20column%20numbers%20from%20a%20CommonTreeNodeStream+page:1+mid:ni7by4xx746ra42f+state:results

DamienCassou

Say you had:

foo: bar e=expression;

And expression should be int but returns float instead. So, pass $e.tree to your message handler and cast it to CommonTree. This reference then has access to the documented methods of CommonTree. So:

public void logMsg(MessageDescriptor m, CommonTree ct, Object... args) {

  CommonToken st;
  CommonToken et;

  st = (CommonToken)(tokens.get(ct.getTokenStartIndex()));
  et = (CommonToken)(tokens.get(ct.getTokenStopIndex()));

  // Call the standard logger, using the information in the tokens
  //
  logMsg(m, st.getLine(), st.getStartIndex(), et.getStopIndex(), args);

}

You can then print out something like:

Warning: (6, 33) : Expression should result in an integer value but gives a
float:
bar 84/3
^^^^

How to find the offset of a token, in terms of the number of characters from the start of the stream?

http://antlr.markmail.org/search/?q=C+target+character+position#query:C%20target%20character%20position+page:1+mid:d662qqih76ijwqja+state:results

AZ

The very first token gives you a =1 for the char position in line I am afraid, I need to work around that I think, but the indexes are pointers in to memory (your input) and not 0, 1, 2 etc. Note that the token also remembers that start of the line that it is located on.

If the start of the first token is not the start of your data, then perhaps there are comments and newline tokens that are skipped before the first token that the parser sees? If this did not work, there would be a lot of broken parsers out there.

So, use the pointer to get the start, subtract it from the end pointer to get the length and print out that many characters, which will show you what the token matched. The line start is updated when a '\n' is seen by the parser, but you can change the character. This is useful for error messages when you want to print the text line that an error occurs in.

The offset of the token is the start point minus the input start (use the address you pass in (databuffer) and not input->data), however, the pointer is pointing directly at that anyway. The token stream does not return off channel tokens or SKIP()ed tokens.

Why am I getting incorrect values for CharPositionInLine for tokens on first line of input ?

http://antlr.markmail.org/search/?q=Jim%20Idle#query:Jim%20Idle%20from%20list%3Aorg.antlr.antlr-interest%20from%3A%22Jim%20Idle%22+page:83+mid:rnvmkdec65rddzua+state:results

I'm using ANTLR3 in one of my project and it seems that I found small bug in ANTLR C (version 3.2, 3.1.3 ).

ANTLR3 C runtime returns incorrect values of CharPositionInLine for tokens on first line of input. This is eventually propagated to tokens.I wrote small program to demonstrate this behavior - it is available  here: http://devel-www.cyber.cz/files/tmp/antlrc3-bug-pack.zip

Bug causes token stream post-processing a little bit more complicated  than it can ideally be ...

Here is expected output (produced by Python target). First character is from input.txt, second is result of  getCharPositionInLine():

--- cut --- cut --- cut ---

1 0

2 2

3 4

\n 5

1 0

2 2

3 4

\n 5

--- cut --- cut --- cut ---

and here is actual output from C target:

--- cut --- cut --- cut ---

1 4294967295

2 1

3 3

\n 4

1 0

2 2

3 4

\n 5

--- cut --- cut --- cut ---

Please notice that first token has undefined value in  getCharPositionInLine() and rest till end of line is shifted by -1.

Workaround is to set ctx->pLexer->input->charPositionInLine to zero  after constructing lexer and before actual lexing/parsing.

Ales Teska

Yes - you have to special case it. However, you only ever come across this while developing really ;-)

Memory

How to solve memory usage issues for following generated C parser?

http://antlr.markmail.org/search/?q=C+runtime+issue#query:C%20runtime%20issue+page:1+mid:jfngdd2ci6h7qrbo+state:results

(memory usage climb from 770MB to 4GB in a few seconds and the parser never returns.)

init code is as follows: