Page Comparison

...

In summary, there is no ANTLR option that enables case insensitivity, as this is hard or impossible to do completely correctly, taking into account all possible internationalization issues. Therefore you need to implement a custom LA(int) to provide the case-insensitive behavior you want.

Handle case insensitivity directly in a grammar

Following the FAQ on abbreviated keywords, we can write a token to accept letters of either case:

Code Block
SELECT : ('S'\|'s')('E'\|'e')('L'\|'l')('E'\|'e')('C'\|'c')('T'\|'t') ;

The following awk script will generate the above:

Code Block


#!/usr/bin/awk -f
{
  printf("%s : ", toupper($0));
  for (i = 1; i <= length($0); i++) {
    c = substr($0, i, 1);
    if (toupper(c) != tolower(c))
      printf("('%s'|'%s')", toupper(c), tolower(c));
    else
      printf("('%s')", c);
  }
  print " ;";
}

You may find the following easier to use, read, and maintain the following:

Code Block
SELECT : S E L E C T ; fragment C : 'c' \| 'C'; fragment E : 'e' \| 'E'; fragment L : 'l' \| 'L'; fragment S : 's' \| 'S'; fragment T : 't' \| 'T';

Of course, you will need a fragment for each letter (i.e. A through Z) used in all of the lexical rules for which you want case insensitivity.
Also take note that calling a fragment rule for each character may impact on performance; test with your typical input to see if it
helps or degrades performance.

Java - Implement a custom File or String Stream and Override LA

...

Code Block

/// <summary>
/// Look ahead for tokenizing is all lowercase, whereas the original case of an input stream is preserved.
///</summary>
public class CaseInsensitiveStringStream : ANTLRStringStream {
    public CaseInsensitiveStringStream(char\[\] data, int numberOfActualCharsInArray) : base(data, numberOfActualCharsInArray) {}

    public CaseInsensitiveStringStream() {}

    public CaseInsensitiveStringStream(string input) : base(input) {}

     // Only the lookahead is converted to lowercase. The original case is preserved in the stream.
    public override int LA(int i) {
        if (i == 0) {
            return 0;
        }

        if (i < 0) {
            i++;
        }

        if (((p + i) - 1) >= n) {
            return (int) CharStreamConstants.EOF;
        }

        return Char.ToLowerInvariant(data\[(p + i) - 1\]);&nbsp; // This is how "case insensitive" is defined, i.e., could also use a special culture...
    }
}

Handle case insensitivity directly in a grammar

Following the FAQ on abbreviated keywords, we can write a token to accept letters of either case:

Code Block
SELECT : ('S'\|'s')('E'\|'e')('L'\|'l')('E'\|'e')('C'\|'c')('T'\|'t') ;

The following awk script will generate the above:

Code Block


#!/usr/bin/awk -f
{
  printf("%s : ", toupper($0));
  for (i = 1; i <= length($0); i++) {
    c = substr($0, i, 1);
    if (toupper(c) != tolower(c))
      printf("('%s'|'%s')", toupper(c), tolower(c));
    else
      printf("('%s')", c);
  }
  print " ;";
}

You may find the following easier to use, read, and maintain the following:

Code Block
SELECT : S E L E C T ; fragment C : 'c' \| 'C'; fragment E : 'e' \| 'E'; fragment L : 'l' \| 'L'; fragment S : 's' \| 'S'; fragment T : 't' \| 'T';

Of course, you will need a fragment for each letter (i.e. A through Z) used in all of the lexical rules for which you want case insensitivity.
Also take note that calling a fragment rule for each character may impact on performance; test with your typical input to see if it
helps or degrades performance.

Versions Compared

Old Version 14

New Version 15

Key

Handle case insensitivity directly in a grammar

Java - Implement a custom File or String Stream and Override LA

Handle case insensitivity directly in a grammar