Error reporting and recovery
Default Error Reporting Behavior
Errors encountered during lexing and parsing are passed to the displayRecognitionError()
method. This method has access to the exception that holds information about the error, and uses it to compose an error string that is then passed to the emitErrorMessage()
method. The default behavior of emitErrorMessage()
is to print the error string to System.err
.
Custom Error Reporting
To change the default error reporting behavior, override either the displayRecognitionError()
or the emitErrorMessage()
methods in the lexer and parser. If you need to change the format of the error message or obtain extra information, such as the error location, then you must override displayRecognitionError()
. If all you need to do is change where errors are reported and are happy to keep the error message itself unchanged, then you can just override emitErrorMessage()
instead. Here is an example of overriding displayRecognitionError()
:
@members { public void displayRecognitionError(String[] tokenNames, RecognitionException e) { String hdr = getErrorHeader(e); String msg = getErrorMessage(e, tokenNames); // Now do something with hdr and msg... } }
Rather than simply printing errors to a stream, another common approach is to store them in a data structure for use later in the application. An example of this approach is to append errors to a List within the lexer and parser and to provide a public method to allow access to the list:
@members { private List<String> errors = new LinkedList<String>(); public void displayRecognitionError(String[] tokenNames, RecognitionException e) { String hdr = getErrorHeader(e); String msg = getErrorMessage(e, tokenNames); errors.add(hdr + " " + msg); } public List<String> getErrors() { return errors; } }
A cleaner approach to error handling is to delegate reporting to a separate object using a defined interface. This has the advantage that it is easier to use different error handling strategies with the same lexer and parser. It also allows testing of the error reporting functionality in isolation from other components. Here is an example of an interface to handle errors:
public interface IErrorReporter { void reportError(String error); }
The lexer and parser can then be modified to receive an object that implements this interface and reports errors to it, for example:
@members { private IErrorReporter errorReporter = null; public void setErrorReporter(IErrorReporter errorReporter) { this.errorReporter = errorReporter; } public void emitErrorMessage(String msg) { errorReporter.reportError(msg); } }
And finally you can create a class that implements the error reporting interface and pass an instance of that class to the lexer and parser, for example:
public class StdErrReporter implements IErrorReporter { public void reportError(String error) { System.err.println(error); } } ... IErrorReporter errorReporter = new StdErrReporter(); myLexer.setErrorReporter(errorReporter); myPaser.setErrorReporter(errorReporter); ...
It is generally the case that the destination for errors does not change once lexing and parsing has begun, so a refinement on the above solution is to pass the error reporter object in as an extra parameter to the lexer and parser constructor methods, instead of using a completely separate method.
Single token insertion and deletion
As of v3.1, you can turn on and off single token insertion/deletion error recovery. It is off by default for tree parsers. This happens because TreeParser
overrides recoverFromMismatchedToken()
to do nothing but throw an exception. To illustrate, here is some invalid input for a small C-like programming language:
int foo() { int i; i = 3 3; }
By default, you get errors and tree output:
line 3:7 extraneous input '3' expecting ';' tree=(FUNC_DEF (FUNC_HDR int foo) (BLOCK (VAR_DEF int i) 3))
(Assuming Main.java prints the tree).
Add the following to your grammar to override the default insert/delete behavior:
@members { protected Object recoverFromMismatchedToken(IntStream input, int ttype, BitSet follow) throws RecognitionException { throw new MismatchedTokenException(ttype, input); } }
and now you get
line 3:14 mismatched input '3' expecting ';' tree=(FUNC_DEF (FUNC_HDR int foo) (BLOCK (VAR_DEF int i) <mismatched token: [@19,41:41='3',<12>,3:14], resync=i = 3> 3))
ANTLR shows the mismatched token in the tree now; it didn't recover as it did before.