Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Introduction 

...

and describes that symbol a on the left side can be replaced with the symbol b on the right side. If b is itself a rule, b is also replaced with its right side, and so on, until no symbol referencing a rule remains. The number and order of symbols, usually called tokens, on the right side is unlimited and arbitrary. It is possible to denote several alternatives of replacement tokens with the | symbol - the used alternative is chosen dependent on the input. To discriminate between tokens which occur only in rules and tokens which occur also in the input, a single quote is used as string delimiter (character escaping works like in Java). A basic example of this behaviour is exhibited by this grammar:

...

A popular mistake is to use left recursion for LL-Parsers like ANTLR (LR-Parsers like yacc can handle this kind of construct). Recursion in general means that an alternative of rule a references the rule a again and is allowed, as otherwise the possible input is finite. Left recursion occurs, when the rule a is in one of the alternatives referenced directly at the beginning like so:

...

The problem is that ANTLR enters an infinite loop as every invocation of rule a immediately invokes rule a again. (In reality, the parser generation aborts upon detecting a left-recursive rule, so nonetheless the grammar has to be changed.) The solution is a simple rewrite like:

...

Another point of interest is the order of the token declaration. The earlier a token is defined, the higher is the precedence if a certain input can be matched by two or more tokens. This means that using the tokens command to define keywords will match those keywords instead a more general ID rule. The following code snippet provides an example:

...

If you give an input containing only spaces then WS will be chosen. Should one change the rules order that FOO comes before WS then FOO will be chosen. Any input containing other characters than spaces will match FOO, even if two or more WS and FOO tokens could be produced. The lexer rules will match greedily the maximum of applicable characters.

...