Grammars
Grammar syntax
All grammars are of the form:
/** This is a grammar doc comment */
grammar-type grammar
name;
options { name1 =
value; name2 =
value2; ... }
import delegateName1=grammar1, ..., delegateNameN=grammarN; // can omit delegateName
tokens { token-name1; token-name2 =
value; ... }
scope global-scope-name-1 { «attribute-definitions» }
scope global-scope-name-2 { «attribute-definitions» }
...
@header { ... }
@lexer::header { ... }
@members { ... }
«rules»
The type of the grammar, specified via the grammar-type modifier above, can be one of: lexer
, parser
, tree
, and combined (no modifier). To set the superclass of the generated parser class, use the superClass
option. See Grammar options for a list of valid grammar options and their semantics.
Rule syntax
/** rule comment */
access-modifier rule-name[«arguments»] returns [«return-values»] throws name1, name2, ...
options {...}
scope {...}
scope global-scope-name, ..., global-scope-nameN;
@init {...}
@after {...}
: «alternative-1» -> «rewrite-rule-1»
| «alternative-2» -> «rewrite-rule-2»
...
| «alternative-n» -> «rewrite-rule-n»
;
catch [«exception-arg-1»] {...}
catch [«exception-arg-2»] {...}
finally {...}
See Rule and subrule options for a list of valid rule options and their semantics.
Lexer, Parser and Tree Parser rules
Rules in a grammar are special cases of identifier names. Lexer rules must start with an upper case letter, parser and tree parser rules must start with a lower case letter.
LexerRuleName : ('A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; ParserRuleName : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; TreeParserRuleName : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
Here are some common lexical rules for programming languages:
WS : (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;} ; COMMENT : '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;} ; LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} ;
The $channel=HIDDEN;
action places those tokens on a hidden channel. They are still sent to the parser, but the parser does not see them. Actions, however, can ask for the hidden channel tokens. If you want to literally throw out tokens then use action skip();
(see org.antlr.runtime.Lexer.skip()).
Sometimes you will need some help or rules to make your lexer grammar more readable. Use the fragment
modifier in front of the rule:
HexLiteral : '0' ('x'|'X') HexDigit+ ; fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
In this case, HexDigit
is not a token in its own right; it can only be called from HexLiteral
.
Warning: T__ is considered a reserved token name and token name prefix. please don't use it as one of your rule names.
Tree grammar rules
Rules in tree grammars are identical to parser grammars except that they can specify a tree element to match. The syntax is ^( root child1 child2 ... childn )
. For example:
decl : ^(DECL type declarator) {System.out.println($type.text+" "+$declarator.text);} ;
Attribute scope syntax
Attribute scopes are a set of attribute definitions of the form:
scope
name {
type1 attribute-name1;
type2 attribute-name2;
}
Grammar action syntax
Grammar actions are, in general, of the form:
@
action-name { ... }
@
scope-name::
action-name { ... }
The default scope-name is parser
. For instance, @header
is the same as @parser::header
. Valid scope-name 's differ depending on the target, but most targets should support parser
and lexer
. Two common action-name 's are header
and members
. The header
action is placed at the top of a generated class definition and the members
action is inserted within the body of a generated class definition. For example, the following grammar actions would ensure generated parser and lexer Java classes include a package declaration:
@parser::header { package my.example.package; } @lexer::header { package my.example.package; }
Rule elements
Rules may reference:
Element | Description |
---|---|
T | Token reference. An uppercase identifier; lexer grammars may use optional arguments for fragment token rules. |
T<node=V> or T<V> | Token reference with the optional token option |
T[«args»] | Lexer rule (token rule) reference. Lexer grammars may use optional arguments for fragment token rules. |
r [«args»] | Rule reference. A lowercase identifier with optional arguments. |
'«one-or-more-char»' | String or char literal in single quotes. In parser, a token reference; in lexer, match that string. |
{«action»} | An action written in target language. Executed right after previous element and right before next element. |
{«action»}? | Semantic predicate. |
{«action»}?=> | Gated semantic predicate. |
(«subrule»)=> | Syntactic predicate. |
(«x»|«y»|«z») | Subrule. Like a call to a rule with no name. |
(«x»|«y»|«z»)? | Optional subrule. |
(«x»|«y»|«z»)* | Zero-or-more subrule. |
(«x»|«y»|«z»)+ | One-or-more subrule. |
«x»? | Optional element. |
«x»* | Zero-or-more element. |
«x»+ | One-or-more element. |