add : mult ('+'^ mult)* ; // left association mult : pow ('*'^ pow)* ; // left association pow : atom ('^'^ pow)? ; // right association atom : ID | INT | '('^ add ')'! ; // recursion |
It is often useful to create imaginary tokens for the branch nodes.
This requires a more involved syntax to achieve the same result.
tokens { ADD; MULT; POW; ATOM; } add : (mult -> mult) ('+' m=mult -> ^(ADD[$add] $add $m))* ; mult : (pow -> pow) ('*' p=pow) -> ^(MULT[$mult] $mult $p))* ; pow : (atom '^' p=pow -> ^(POW[$pow] atom $p) | (atom -> atom) ; atom : ID | INT | '(' add ')' -> ^(ATOM[$atom] add) ; |
It is common for statements to be optionally terminated by a new line or a semicolon.
(Python, VisualBasic, Bash, etc.)
This is a case of an 'island-grammar'.
At some point the content of the string will need to be transformed.
For example "\u20\u6e\u69\u63\u6f\u64\u65" would be transformed into "Unicode".
When should this transformation be performed?
It is generally a bad practice to modify the input stream.
ESC : '\\' ( 'n' {this.setText("\n");} | 't' {this.setText("\t");} | 'v' {this.setText("\013");} | 'b' {this.setText("\b");} | 'r' {this.setText("\r");} | 'f' {this.setText("\r");} | 'a' {this.setText("\007");} | '\\' {this.setText("\\");} | '?' {this.setText("?");} | '\'' {this.setText("'");} | '"' {this.setText("\"");} | OCTDIGIT (OCTDIGIT? OCTDIGIT)? { char[] realc = new char[1]; realc[0] = (char) Integer.valueOf($text, 8).intValue(); this.setText(new String(realc)); } | 'x' HEXDIGIT HEXDIGIT? { char[] realc = new char[1]; realc[0] = (char) Integer.valueOf($text.substring(1), 16).intValue(); this.setText(new String(realc)); } | 'u' HEXDIGIT ((HEXDIGIT? HEXDIGIT)? HEXDIGIT)? { char[] realc = new char[1]; realc[0] = (char) Integer.valueOf($text.substring(1), 16).intValue(); this.setText(new String(realc)); } ) ; |
An alternative approach that may be useful.
fragment MARKER : '"' ; ESCCHAR : '\\' ; LITERAL : MARKER (options {greedy=false;}: ESCCHAR . | .)* MARKER ; |
UNICODE_LITERAL : '\\u' HEXDIGIT ((HEXDIGIT? HEXDIGIT)? HEXDIGIT)? ; literal returns [char value] : UNICODE_LITERAL { $value = (char)Integer.valueOf($text.substring(1), 16).intValue(); } ; |
contextBody(foo,bar) ::= << <foo; format="toUpper"> <bar; format="decode"> >> |
Where 'bar' is the escaped string.
In this case the renderer itself could be a lexer/parser for the regular language.
Many languages provide support for regular expressions.
This is a case of an 'island-grammar'.