Currently syntax errors cause invalid trees and possibly even runtime exceptions when building ASTs. What we really need I believe is to have rules that encounter syntax errors return an ERROR node of some sort that records where the error occurred and, with luck, the tokens consumed during recovery. I started an improvement request:
http://www.antlr.org:8888/browse/ANTLR-193
The basic idea is that ERROR nodes get used in place of ASTs that would normally be produced by rule indications. For example, the following rule would return a valid AST except for the subtrees associated with rule refs in encountering syntax errors:
forDecl : 'for' '(' decl ';' expr ';' expr ')' stat -> ... ;
If there is an error inside decl, the tree would return
^('for' ERROR subtree-expr subtree-expr)
This effectively means that I must turn off the single token insertion and deletion that occurs automatically within a single rule. If a syntax error occurs, the immediately surrounding rule must terminate in return an error node.
--------------- from user:
test : 'var' ID ';' -> ^('var' ID);
If the input if "var ;", the token insertion system detect that the
token "ID" is missing, then report the error, but continue parsing.
ID2=(Token)input.LT(1); // save ID2 match(input,ID,FOLLOW_ID_in_test26); stream_ID.add(ID2); // ID2 have a bad reference
ID2 contains a reference to the token ';' and not to the token ID. The
"match" procedure doesn't thow any exception because of the "token
insertion" system.
So the resulting tree will be in reality ^( 'var' ';')
TJP:
Hmm...
- upon single-token deletion, just delete and continue with report; Actually it messes up the labels; above, ID2 would be '3' upon "var 3 ID;"
- upon insertion for missing token, call a method so it can do something different for ID vs ';'. Still, label is messed up. Can we add
if ( failed ) <label>=getTokenInsertOrDelete()
to end? lots of code bloat.
http://www.antlr.org/wiki/display/ANTLR3/Error+reporting+and+recovery