Left-Recursion Removal


Writing grammars, one will encounter left-recursive rules. As ANTLR is a recursive-descent parser, it cannot cope with them and thus they have to be removed. But what is a left-recursive rule exactly? It is a recursive rule which calls itself on the left-edge of a global alternative. An example would be:

a   :    a b
    |    b

ANTLRworks can resolve this problem and turns rule a into

a   :    b+;

But ANTLRworks can do this automatically only for the above kind of left-recursive rules, which exhibit the left recursion within the implementation of the recursive rule itself. I would call these single left-recursive (SLR) rules. The other kind are mutually left-recursive (MLR) rules where the left-recursion occurs over several rule invocations. The simplest example would be the following:

a   :    b
    |    c

b   :    a
    |    c

Rule a references rule b, while rule b references rules a. A more complex example would show such a 're-reference' taking place after many subsequent rule invocations. As stated above, the resolvement has to be done manually here.

As the strategy to solve this problem isn't that obvious, I decided to write a tutorial about two real grammar cases I have encountered while working on a C# grammar. It should be noted that this tutorial describes what I wished to have happened, as I visited many more dead-ends than noted. Furthermore, the strategy to resolve MLR rules is based on the description by Gavin Lambert.