Lexical filters

ANTLR has a lexical filter mode that lets you sift through an input file looking for certain grammatical structures. The rules are prioritized in the order specified in case an input construct matches more than a single rule, with the first rule having the highest priority. The filter proceeds character-by-character looking for a match among the rules. If no match, consume that char and try again. The following example, prints found var foo for every field foo in the input:

lexer grammar FuzzyJava;
options {filter=true;}

FIELD
    :   TYPE WS name=ID '[]'? WS? (';'|'=')
        {System.out.println("found var "+$name.text);}
    ;

fragment
TYPE :   ID ('.' ID)*
        ;

fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
    ;

WS  :   (' '|'\t'|'\n')+
    ;

Don't forget that you must ignore text in comments, so add another rule:

COMMENT
    :   '/*' (options {greedy=false;} : . )* '*/'
        {System.out.println("found comment "+getText());}
    ;

ANTLR 3

Lexical filters

Related content