/
Grammar Design Patterns

Grammar Design Patterns

Implementing precedence rules

add  : mult ('+'^ mult)* ; // left association
mult : pow ('*'^ pow)* ; // left association
pow  : atom ('^'^ pow)? ; // right association
atom : ID | INT | '('^ add ')'! ; // recursion

It is often useful to create imaginary tokens for the branch nodes.
This requires a more involved syntax to achieve the same result.

tokens
{
  ADD;
  MULT;
  POW;
  ATOM;
}

add  : (mult -> mult) ('+' m=mult -> ^(ADD[$add] $add $m))* ;
mult : (pow -> pow) ('*' p=pow) -> ^(MULT[$mult] $mult $p))* ;
pow  :  (atom '^' p=pow -> ^(POW[$pow] atom $p)  | (atom -> atom) ;
atom : ID | INT | '(' add ')' -> ^(ATOM[$atom] add) ;

Detecting statement terminator

It is common for statements to be optionally terminated by a new line or a semicolon.
(Python, VisualBasic, Bash, etc.)

String containing escaped characters

This is a case of an 'island-grammar'.
At some point the content of the string will need to be transformed.
For example "\u20\u6e\u69\u63\u6f\u64\u65" would be transformed into "Unicode".
When should this transformation be performed?

By the lexer?

It is generally a bad practice to modify the input stream.

ESC
  : '\\'
     ( 'n'             {this.setText("\n");}
     | 't'             {this.setText("\t");}
     | 'v'             {this.setText("\013");}
     | 'b'             {this.setText("\b");}
     | 'r'             {this.setText("\r");}
     | 'f'             {this.setText("\r");}
     | 'a'             {this.setText("\007");}
     | '\\'            {this.setText("\\");}
     | '?'             {this.setText("?");}
     | '\''            {this.setText("'");}
     | '"'             {this.setText("\"");}
     | OCTDIGIT (OCTDIGIT? OCTDIGIT)?
       {
	 char[] realc = new char[1];
	 realc[0] = (char) Integer.valueOf($text, 8).intValue();
	 this.setText(new String(realc));
       }
     | 'x' HEXDIGIT HEXDIGIT?
       {
         char[] realc = new char[1];
         realc[0] = (char) Integer.valueOf($text.substring(1), 16).intValue();
         this.setText(new String(realc));
       }
     | 'u' HEXDIGIT ((HEXDIGIT? HEXDIGIT)? HEXDIGIT)?
       {
         char[] realc = new char[1];
         realc[0] = (char) Integer.valueOf($text.substring(1), 16).intValue();
         this.setText(new String(realc));
       }
     )
     ;

An alternative approach that may be useful.

fragment 
MARKER : '"' ;
ESCCHAR : '\\' ;
LITERAL : MARKER (options {greedy=false;}: ESCCHAR . | .)* MARKER ;

By the parser?

UNICODE_LITERAL : '\\u' HEXDIGIT ((HEXDIGIT? HEXDIGIT)? HEXDIGIT)? ;
literal returns [char value] : UNICODE_LITERAL
  { $value = (char)Integer.valueOf($text.substring(1), 16).intValue(); } ;

By the renderer?

contextBody(foo,bar) ::= <<
<foo; format="toUpper">
<bar; format="decode">
>>

Where 'bar' is the escaped string.
In this case the renderer itself could be a lexer/parser for the regular language.

Processing regular expressions

Many languages provide support for regular expressions.
This is a case of an 'island-grammar'.