Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I believe this mechanism would provide a more satisfying experience for the lexer grammar developer. It might actually be faster than the current recursive descent lexers in v3. It's easier to generate and faster to analyze. It should also take much less time to initialize at lexer runtime because the state machine will be smaller. Currently, v3 lexers are huge. For example, the Java lexer is 4747 lines! It's just not that hard a problem to deserve that much code. (wink) Oh, it should also make it much easier to build incremental lexers, if we decide to implement those.

Also, another issue as Gavin Lambert says:

While it's usually possible to frame the contents of a looping construct in a positive sense (which is what ANTLR currently requires), it'd be nice if there were a language construct that would let you do specific negative matches too (maybe a feature request for v4?).

For (a completely made up) example, consider a case where you might want to match any identifier except one starting with "foo". In current ANTLR, you'd have to do one of these:

Code Block

 FOOLIST: 'foo[' NON_FOO_ID+ ']';
 FOOLIST: 'foo[' ({!next_id_starts_with_foo()}? => ID)+ ']';
 FOOLIST: 'foo[' ((~'f' | 'f' ~'o' | 'fo' ~'o') => ID)+ ']';

It'd be nice if there was some way to express a negative match via a syntactic predicate, eg:

Code Block

 FOOLIST: 'foo[' (('foo') => ~ | ID)+ ']';

(where '~' in an alt basically means "break", ie. match nothing and terminate the innermost loop.)
Or, perhaps better:

Code Block

 FOOLIST: 'foo[' (('foo') ~=> ID)+ ']';

(where '~=>' means "only take this path if the predicate fails")

Granted, this sort of requirement doesn't come up often, but when it does it'd be nice to have a tidier way of expressing it; and it'd be fairly simple to implement... (smile)