How can I emit more than a single token per lexer rule?
Here is a rule to match floats:
NUM_FLOAT : DIGITS '.' (DIGITS)? (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)? | '.' DIGITS (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)? | DIGITS EXPONENT_PART FLOAT_TYPE_SUFFIX | DIGITS EXPONENT_PART | DIGITS FLOAT_TYPE_SUFFIX ;
Now if you want to add '..' range operator so 1..10 makes sense, ANTLR has trouble distinguishing 1. (start of the range) from 1. the float without backtracking. So, match '1..' in NUM_FLOAT and just emit two non-float tokens:
NUM_FLOAT : d=DIGITS r='..' { $d.setType(NUM_INT); emit($d); $r.setType(RANGE); emit($r); } | DIGITS '.' (DIGITS)? (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)? | '.' DIGITS (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)? | DIGITS EXPONENT_PART FLOAT_TYPE_SUFFIX | DIGITS EXPONENT_PART | DIGITS FLOAT_TYPE_SUFFIX ;
By default Lexer
objects only emit 1 token at once. Make a buffer by overriding a few methods:
@lexer::members { List tokens = new ArrayList(); public void emit(Token token) { state.token = token; tokens.add(token); } public Token nextToken() { super.nextToken(); if ( tokens.size()==0 ) { return Token.EOF_TOKEN; } return (Token)tokens.remove(0); } }