Five minute introduction to ANTLR 3

What is ANTLR 3?

ANTLR – - ANother Tool for Language Recognition – - is a tool that is used in the construction of formal language software tools (or just language tools) such as translators, compilers, recognizers and, static/dynamic program analyzers. Developers use ANTLR to reduce the time and effort needed to build and maintain language processing tools. In common terminology, ANTLR is a compiler generator or compiler compiler (in the tradition of tools such as Lex/Flex and Yacc/Bison) and it is used to generate the source code for language recognizers, analyzers and translators from language specifications. ANTLR takes as it's its input a grammar - a precise description of a language augmented with semantic actions - and generates source code files and other auxiliary files. The target language of the generated source code (e.g. Java, C/C++, C#, Python, Ruby) is specified in the grammar.

...

ANTLR 3 is the latest version of a language processing toolkit that was originally released as PCCTS in the mid-1990s. As was the case then, this release of the ANTLR toolkit advances the state of the art with it's its new LL parsing engine. ANTLR provides a framework for the generation of recognizers, compilers, and translators from grammatical descriptions. ANTLR grammatical descriptions can optionally include action code written in what is termed the target language (i.e. the implementation language of the source code artifacts generated by ANTLR).

...

Because it can save you time and resources by automating significant portions of the effort involved in building language processing tools. It is well established that generative tools such as compiler compilers have a major, positive impact on developer productivity. In addition, many of ANTLR v3's new features including an improved analysis engine, it's its significantly enhanced parsing strength via LL parsing with arbitrary lookahead, it's its vastly improved tree construction rewrite rules and the availability of the simply fantastic AntlrWorks IDE offers productivity benefits over other comparable generative language processing toolkits.

...

Download and install ANTLR 3 from the ANTLR 3 page of the ANTLR website.

2. Run ANTLR 3 on a simple grammar

...

Java

Code Block

grammar SimpleCalc;

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members {
    public static void main(String[] args) throws Exception {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalcParser parser = new SimpleCalcParser(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            e.printStackTrace();
        }
    }
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDEN; } ;

fragment DIGIT	: '0'..'9' ;

C#

Note: language=CSharp2 with ANTLR 3.1; ANTLR 3.0.1 uses the older CSharp target

Code Block

grammar SimpleCalc;

options {
    language=CSharpCSharp2;
}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members {
    public static void Main(string[] args) {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalcParser parser = new SimpleCalcParser(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            Console.Error.WriteLine(e.StackTrace);
        }
    }
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDENHidden; } ;

fragment DIGIT	: '0'..'9' ;

Objective-C

To be written. Volunteers?

Code Block
grammar SimpleCalc; options { language=ObjC; } OR : '\|\|' ;

C

Code Block

grammar SimpleCalc;

options
{
    language=C;
}

tokens
{
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members
{

 #include "SimpleCalcLexer.h"

 int main(int argc, char * argv[])
 {

    pANTLR3_INPUT_STREAM         {  input;
    pSimpleCalcLexer              pANTLR3_INPUT_STREAM input = antlr3AsciiFileStreamNew(argv[1]); lex;
    pANTLR3_COMMON_TOKEN_STREAM    tokens;
    pSimpleCalcParser         pSimpleCalcLexer lex = SimpleCalcLexerNew(input);  parser;

    input  = antlr3AsciiFileStreamNew          ((pANTLR3_COMMON_TOKEN_STREAM tokens = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, lex->pLexer->tokSource);UINT8)argv[1]);
    lex    =   SimpleCalcLexerNew              pSimpleCalcParser parser = SimpleCalcParserNew(tokensinput);
    tokens = antlr3CommonTokenStreamSourceNew     (ANTLR3_SIZE_HINT, TOKENSOURCE(lex));
    parser = SimpleCalcParserNew      parser->expr(parser);         (tokens);

    parser  ->expr(parser);

    // Must manually clean up
    //
               parserparser ->free(parser);
    tokens ->free(tokens);
        lex      tokens->free(tokenslex);
    input  ->close(input);

    return 0;
 }

   lex->free(lex);
                    input->close(input);

                    return 0;}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term   ( ( PLUS | MINUS )  term   )*
        ;

term	: factor ( ( MULT | DIV   )  factor )*
        ;

factor	: NUMBER
        ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	    : (DIGIT)+
            ;

WHITESPACE  : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+
              {
                 $channel = HIDDEN;
              }
            ;

fragment
DIGIT	    : '0'..'9'
            ;

Python

Code Block


grammar SimpleCalc;

options {
	language = Python;
}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@header {
import sys
import traceback

from SimpleCalcLexer import SimpleCalcLexer
}

@main {
def main(argv, otherArg=None):
  char_stream = ANTLRFileStream(sys.argv[1])
  lexer = SimpleCalcLexer(char_stream)
  tokens = CommonTokenStream(lexer)
  parser = SimpleCalcParser(tokens);

  try:
        parser.expr()
  except    }RecognitionException:
	traceback.print_stack()
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDEN; } ;

fragment DIGIT	: '0'..'9' ;

...

Code Block
java org.antlr.Tool SimpleCalc.g

ANTLR will generate source files for the lexer and parser (e.g. SimpleCalcLexer.java and SimpleCalcParser.java). Copy these into the appropriate places for your development environment and compile them.

2.3 Revisit the simple grammar and learn basic ANTLR 3 syntax

...

Note

title	Before you start

You can learn best by following along, experimenting, and looking at the generated source code. If so, you'll need:

A simple text editor,
An installed copy of ANTLR 3.01, or
An installed copy of ANTLR Works (free, highly recommended, and contains its own copy of ANTLR)

...

First, we have to define white space:

A space is ' '
A tab is written '\t'
A newline (line feed) is written '\n'
A carriage return is written '\r'
A Form Feed has a decimal value of 12 and a hexidecimal value of $0C. ANTLR uses Unicode, so we define this as 4 hex digits: {{
u000C }} '\u000C'

Put these together with an "or", allow one or more to occur together, and you have

...

You hide the token by setting the token's $channel flag to the constant HIDDEN. This requires adding a little code to the lexer, which you do by adding curly brackets:

Code Block

title	Defining whitespace

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; };

...

Code Block

title	Main entry point for Java

@members {
    public static void main(String[] args) throws Exception {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalcSimpleCalcParser parser = new SimpleCalcSimpleCalcParser(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            e.printStackTrace();
        }
    }
}

...

Code Block

title	Typical options block

grammar SimpleCalc;

options {
    language=CSharpCSharp2;
}

Your five minutes are up!

...

Construct	Description	Example
`(...)*`	Kleene closure - matches zero or more occurrences	`LETTER DIGIT*` - match a `LETTER` followed by zero or more occurrences of `DIGIT`
`(...)+`	Positive Kleene closure - matches one or more occurrences	`('0'..'9')+` - match one or more occurrences of a numerical digit `LETTER (LETTER\|DIGIT)+` - match a `LETTER` followed one or more occurrences of either `LETTER` or `DIGIT`
`fragment`	`fragment` in front of a lexer rule instructs ANTLR that the rule is only used as part of another lexer rule (i.e. it only builds a fragment of a recognized token)	`fragment` {{ DIGIT : '0'..'9' ; NUMBER : (DIGIT)+ ('.' (DIGIT)+ )? ;}}

Versions Compared

Old Version 47

New Version Current

Key

Five minute introduction to ANTLR 3

What is ANTLR 3?

2. Run ANTLR 3 on a simple grammar

2.3 Revisit the simple grammar and learn basic ANTLR 3 syntax

Your five minutes are up!

Page Comparison

Versions Compared

Old Version 47

New Version Current

Key

Five minute introduction to ANTLR 3

What is ANTLR 3?

2. Run ANTLR 3 on a simple grammar

2.3 Revisit the simple grammar and learn basic ANTLR 3 syntax

Your five minutes are up!