Grammars

Grammar syntax

All grammars are of the form:

/** This is a grammar doc comment */
grammar-type grammar name;
options { name1 = value; name2 = value2; ... }
import delegateName1=grammar1, ..., delegateNameN=grammarN; // can omit delegateName
tokens { token-name1; token-name2 = value; ... }
scope global-scope-name-1 { «attribute-definitions» }
scope global-scope-name-2 { «attribute-definitions» }
...
@header { ... }
@lexer::header { ... }
@members { ... }

«rules»

The type of the grammar, specified via the grammar-type modifier above, can be one of: lexer, parser, tree, and combined (no modifier). To set the superclass of the generated parser class, use the superClass option. See Grammar options for a list of valid grammar options and their semantics.

Rule syntax

/** rule comment */
access-modifier rule-name[«arguments»] returns [«return-values»] throws name1, name2, ...
options {...}
scope {...}
scope global-scope-name, ..., global-scope-nameN;
@init {...}
@after {...}
    : «alternative-1» -> «rewrite-rule-1»
    | «alternative-2» -> «rewrite-rule-2»
    ...
    | «alternative-n» -> «rewrite-rule-n»
    ;
    catch [«exception-arg-1»] {...}
    catch [«exception-arg-2»] {...}
    finally {...}

See Rule and subrule options for a list of valid rule options and their semantics.

Lexer, Parser and Tree Parser rules

Rules in a grammar are special cases of identifier names. Lexer rules must start with an upper case letter, parser and tree parser rules must start with a lower case letter.

LexerRuleName : ('A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;

ParserRuleName : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;

TreeParserRuleName : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;

Here are some common lexical rules for programming languages:

WS  : (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;}
    ;
COMMENT
    : '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;
LINE_COMMENT
    : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    ;

The $channel=HIDDEN; action places those tokens on a hidden channel. They are still sent to the parser, but the parser does not see them. Actions, however, can ask for the hidden channel tokens. If you want to literally throw out tokens then use action skip(); (see org.antlr.runtime.Lexer.skip()).

Sometimes you will need some help or rules to make your lexer grammar more readable. Use the fragment modifier in front of the rule:

HexLiteral : '0' ('x'|'X') HexDigit+ ;
fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;

In this case, HexDigit is not a token in its own right; it can only be called from HexLiteral.

Warning: T__ is considered a reserved token name and token name prefix. please don't use it as one of your rule names.

Tree grammar rules

Rules in tree grammars are identical to parser grammars except that they can specify a tree element to match. The syntax is ^( root child1 child2 ... childn ). For example:

decl : ^(DECL type declarator) {System.out.println($type.text+" "+$declarator.text);}
     ;

Attribute scope syntax

Attribute scopes are a set of attribute definitions of the form:

scope name {
    type1 attribute-name1;
    type2 attribute-name2;
}

Grammar action syntax

Grammar actions are, in general, of the form:

@action-name { ... }
@scope-name::action-name { ... }

The default scope-name is parser. For instance, @header is the same as @parser::header. Valid scope-name 's differ depending on the target, but most targets should support parser and lexer. Two common action-name 's are header and members. The header action is placed at the top of a generated class definition and the members action is inserted within the body of a generated class definition. For example, the following grammar actions would ensure generated parser and lexer Java classes include a package declaration:

@parser::header { package my.example.package; }
@lexer::header { package my.example.package; }

Rule elements

Rules may reference:

Element

Description

T

Token reference. An uppercase identifier; lexer grammars may use optional arguments for fragment token rules.

T<node=V> or T<V>

Token reference with the optional token option node to indicate tree construction note type; can be followed by arguments on right hand side of -> rewrite rule

T[«args»]

Lexer rule (token rule) reference. Lexer grammars may use optional arguments for fragment token rules.

r [«args»]

Rule reference. A lowercase identifier with optional arguments.

'«one-or-more-char»'

String or char literal in single quotes. In parser, a token reference; in lexer, match that string.

{«action»}

An action written in target language. Executed right after previous element and right before next element.

{«action»}?

Semantic predicate.

{«action»}?=>

Gated semantic predicate.

(«subrule»)=>

Syntactic predicate.

(«x»|«y»|«z»)

Subrule. Like a call to a rule with no name.

(«x»|«y»|«z»)?

Optional subrule.

(«x»|«y»|«z»)*

Zero-or-more subrule.

(«x»|«y»|«z»)+

One-or-more subrule.

«x»?

Optional element.

«x»*

Zero-or-more element.

«x»+

One-or-more element.