...
When we see 'catch', we need to report an error and then blow all the way out back to the "ruleBlock" rule. That screams out for a specialized recognition an exception object. All we have to do is throw it
...
Code Block |
---|
ruleBlock
: altList
;
catch [ResyncToEndOfRuleBlock e] {}
|
Code Block |
---|
/** Used to throw us out of deeply nested element back to end of a rule's * alt list. Note it's not under RecognitionException. */ public class ResyncToEndOfRuleBlock extends RuntimeException { } |
how do we detect the missing ';'? It's not as simple as looking for a ':' in rule element. In our case here, element has to look beyond 'b' in rule 'a' to see if there is an argument following (a[34], for example). It sees "a catch" instead and throws NoViableAltException. The problem is that the current input symbol is "b", not "catch". What we actually need to know is how far ahead it looked before it decided to fail. That token is the problem token (catch). So, I've updated v3 ANTLR runtime so that TokenStream knows how to ask for the "high water mark" via range(). Here is my handler for element that checks for the unterminated rule case:
Code Block |
---|
element
: ...
;
catch [RecognitionException re] {
int ttype = input.get(input.range()).getType();
// look for anything that really belongs at the start of the rule minus the initial ID
if ( ttype==COLON || ttype==RETURNS || ttype==CATCH || ttype==FINALLY || ttype==AT ) {
RecognitionException missingSemi =
new v4ParserException("unterminated rule (missing ';') detected at '"+
input.LT(1).getText()+" "+input.LT(2).getText()+"'", input);
reportError(missingSemi);
throw new ResyncToEndOfRuleBlock();
}
reportError(re);
recover(input,re);
retval.tree = (GrammarAST)adaptor.errorNode(input, retval.start, input.LT(-1), re);
}
|
Upon the above input, ANTLR v4 gives the tasty error message:
Code Block |
---|
error(17): A.g:2:4: unterminated rule (missing ';') detected at 'b catch' while looking for rule element |
Oh, and it correctly recovers and creates the AST for rule 'b'.
Code Block |
---|
(COMBINED_GRAMMAR A
(RULES <mismatched token: [@17,46:46='b',<63>,4:0], resync=a : b catch Exception e {...}>
(RULE b (BLOCK (ALT B)))
)
)
|
We need ResyncToEndOfRuleBlock to avoid recovering in element. input "b catch" starts with a rule ref, which is a valid element. it consumes until FOLLOW(element),which has rule ref in it so it consumes nothing. elements rule will descend back into element. must get it all the back to rule block level to recover properly by consuming all the exception crap. Ultimately I'd like to resync by parsing backwards from the ':'.
Notes...
"grammar A;;" says
...
Code Block |
---|
/** Same as a normal DFA state except that we don't examine lookahead * for the bypass alternative. It delays error detection but this * is faster, smaller, and more what people expect. For (X)? people * expect "if ( LA(1)==X ) match(X);" and that's it. */ dfaOptionalBlockState(k,edges,eotPredictsAlt,description,stateNumber,semPredState) ::= << int LA<decisionNumber>_<stateNumber> = input.LA(<k>);<\n> <edges; separator="\nelse "> >> |
I'm reversing my opinion. We must not delay error attention. If we Don't see the start of an EBNF constructs, we should generate an error and trigger recovery. In fact, we should try to sync/recover at the start of these constructs and within just like this:
a : sync ( stuff sync )* ;
That way, we can get rid of crap in front of an optional element instead of bypassing the optional element and then getting a really strange error that we are missing whatever followed that optional element.
element in antlr grammar had acyclic k>1 predictor. Upon a : A b : B ; (missing , it gets stuck at "b:" not ":". Hard to add error alt. had to do .* COLON. Actually makes sense. it's (TOKEN_REF|RULE_REF)COLON that says we are missing a ';'.