1. Example
Preparations
It is important to work with a copy of the rules, because changing the grammar into the normal form can become quite messy. Create a new grammar file and copy only the rules into the file which are reported in the error message. After that, create rules for missing rule references with ANTLRworks, so that the tool doesn't complain about them.
These rules may neither be empty nor be duplicates - I find the use of keywords here simple and efficient. It is advisable to move those rules to the end of the file so they don't get mixed up with the recursive ones. As every grammar with mutually left-recursive rules is kind of unique in its own sense, the following procedure is to be viewed as a heuristic instead of an algorithm.
After the execution of the first step I had the following C# grammar snippet:
grammar CSharpRecursion; type : value_type | reference_type | type_parameter | pointer_type ; value_type : struct_type | enum_type ; struct_type : type_name | simple_type | nullable_type ; nullable_type : non_nullable_value_type INTERR ; non_nullable_value_type : type ; reference_type : class_type | interface_type | array_type | delegate_type ; array_type : non_array_type rank_specifier+ ; non_array_type : type ; pointer_type : unmanaged_type STAR | VOID STAR ; unmanaged_type : type ; ///////////////////////// // $<OutOfSight type_parameter : AS ; delegate_type : BASE ; interface_type : BOOL ; class_type : BREAK ; simple_type : BYTE ; type_name : CASE ; enum_type : CATCH ; rank_specifier : CHAR ; ABSTRACT : 'abstract'; AS : 'as'; BASE : 'base'; BOOL : 'bool'; BREAK : 'break'; BYTE : 'byte'; CASE : 'case'; CATCH : 'catch'; CHAR : 'char'; CHECKED : 'checked'; CLASS : 'class'; CONST : 'const'; CONTINUE : 'continue'; DECIMAL : 'decimal'; DEFAULT : 'default'; DELEGATE : 'delegate'; DO : 'do'; DOUBLE : 'double'; ELSE : 'else'; ENUM : 'enum'; EVENT : 'event'; EXPLICIT : 'explicit'; EXTERN : 'extern'; FALSE : 'false'; FINALLY : 'finally'; FIXED : 'fixed'; FLOAT : 'float'; FOR : 'for'; FOREACH : 'foreach'; GOTO : 'goto'; IF : 'if'; IMPLICIT : 'implicit'; IN : 'in'; INT : 'int'; INTERFACE : 'interface'; INTERNAL : 'internal'; IS : 'is'; LOCK : 'lock'; LONG : 'long'; NAMESPACE : 'namespace'; NEW : 'new'; NULL : 'null'; OBJECT : 'object'; OPERATOR : 'operator'; OUT : 'out'; OVERRIDE : 'override'; PARAMS : 'params'; PRIVATE : 'private'; PROTECTED : 'protected'; PUBLIC : 'public'; READONLY : 'readonly'; REF : 'ref'; RETURN : 'return'; SBYTE : 'sbyte'; SEALED : 'sealed'; SHORT : 'short'; SIZEOF : 'sizeof'; STACKALLOC : 'stackalloc'; STATIC : 'static'; STRING : 'string'; STRUCT : 'struct'; SWITCH : 'switch'; THIS : 'this'; THROW : 'throw'; TRUE : 'true'; TRY : 'try'; TYPEOF : 'typeof'; UINT : 'uint'; ULONG : 'ulong'; UNCHECKED : 'unchecked'; UNSAFE : 'unsafe'; USHORT : 'ushort'; USING : 'using'; VIRTUAL : 'virtual'; VOID : 'void'; VOLATILE : 'volatile'; WHILE : 'while'; INTERR : '?'; STAR: '*'; // $>
After this all mutually left-recursive rules are checked to see if they are referenced in any other rules besides themselves in the original grammar. The five rules being referenced elsewhere are type, array_type, non_array_type, pointer_type and unmanaged_type. To simplify my future work I inline all unreferenced rules first and save the result:
type : type_name | simple_type | type INTERR | enum_type | class_type | interface_type | array_type | delegate_type | type_parameter | pointer_type ; array_type : non_array_type rank_specifier+ ; non_array_type : type ; pointer_type : unmanaged_type STAR | VOID STAR ; unmanaged_type : type ;
Now I don't have to repeat as many steps. This would be a good point to make a copy of this grammar file because when a mistake can't be easily undone one can start easily anew. Saving intermediate copies of the grammar once you complete each important step, would be a good practice, too. It is important to note that inlining leads always to only one single survivor so the process has to be repeated for every required rule.
It is also possible that rules become single left-recursive and have to be changed by ANTLRworks first (unless that rule is supposed to be the last rule - then postponing this to the last possible step is a good idea). Rules may become ambiguous, too, so doing grammar checks after single left-recursion removal (SLRR) is recommended. Analyze the cause and remove it before proceeding further.
When inlining is done, ANTLRworks automatically adds extra parentheses around the inlined statement. Those parentheses may be superfluous and should be removed as soon as possible, since SLRR may not be possible otherwise. A related problem is that after an SLRR that the trailing prevents the naive deletion of the parentheses around the main body.
This can be solved with "in-factoring": "(a|b) c" => "ac|bc". Further analysis of the syntax diagram can point out other simplifications like "(ac|bc) (c)*" => "(a|b) c+". Don't do them aggressively until only the last rule is left - further inlining may solve asymmetries where not all alternatives share the exact same parts.
Sections
- 1.1 Rule Inlining type
- 1.2 Rule Inlining array_type
- 1.3 Rule Inlining unmanaged_type
- 1.4 Rule Inlining non_array_type
- 1.5 Rule Inlining pointer_type
My siblings (including me):