Lexical filters
ANTLR has a lexical filter mode that lets you sift through an input file looking for certain grammatical structures. The rules are prioritized in the order specified in case an input construct matches more than a single rule, with the first rule having the highest priority. The filter proceeds character-by-character looking for a match among the rules. If no match, consume that char and try again. The following example, prints found var foo
for every field foo
in the input:
lexer grammar FuzzyJava; options {filter=true;} FIELD : TYPE WS name=ID '[]'? WS? (';'|'=') {System.out.println("found var "+$name.text);} ; fragment TYPE : ID ('.' ID)* ; fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')* ; WS : (' '|'\t'|'\n')+ ;
Don't forget that you must ignore text in comments, so add another rule:
COMMENT : '/*' (options {greedy=false;} : . )* '*/' {System.out.println("found comment "+getText());} ;