Page Comparison

...

ANTLR - ANother Tool for Language Recognition - is a tool that is used in the construction of formal language software tools (or just language tools) such as translators, compilers, recognizers and, static/dynamic program analyzers. Developers use ANTLR to reduce the time and effort needed to build and maintain language processing tools. In common terminology, ANTLR is a compiler generator or compiler compiler (in the tradition of tools such as Lex/Flex and Yacc/Bison) and it is used to generate the source code for language recognizers, analyzers and translators from language specifications. ANTLR takes as it's its input a grammar - a precise description of a language augmented with semantic actions - and generates source code files and other auxiliary files. The target language of the generated source code (e.g. Java, C/C++, C#, Python, Ruby) is specified in the grammar.

...

ANTLR 3 is the latest version of a language processing toolkit that was originally released as PCCTS in the mid-1990s. As was the case then, this release of the ANTLR toolkit advances the state of the art with it's its new LL parsing engine. ANTLR provides a framework for the generation of recognizers, compilers, and translators from grammatical descriptions. ANTLR grammatical descriptions can optionally include action code written in what is termed the target language (i.e. the implementation language of the source code artifacts generated by ANTLR).

...

Because it can save you time and resources by automating significant portions of the effort involved in building language processing tools. It is well established that generative tools such as compiler compilers have a major, positive impact on developer productivity. In addition, many of ANTLR v3's new features including an improved analysis engine, it's its significantly enhanced parsing strength via LL parsing with arbitrary lookahead, it's its vastly improved tree construction rewrite rules and the availability of the simply fantastic AntlrWorks IDE offers productivity benefits over other comparable generative language processing toolkits.

...

Download and install ANTLR 3 from the ANTLR 3 page of the ANTLR website.

2. Run ANTLR 3 on a simple grammar

...

Java

Code Block

grammar SimpleCalc;

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members {
    public static void main(String[] args) throws Exception {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalcParser parser = new SimpleCalcParser(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            e.printStackTrace();
        }
    }
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDEN; } ;

fragment DIGIT	: '0'..'9' ;

C#

Note: language=CSharp2 with ANTLR 3.1; ANTLR 3.0.1 uses the older CSharp target

Code Block

grammar SimpleCalc;

options {
    language=CSharp2;
}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members {
    public static void Main(string[] args) {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalcParser parser = new SimpleCalcParser(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            Console.Error.WriteLine(e.StackTrace);
        }
    }
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDENHidden; } ;

fragment DIGIT	: '0'..'9' ;

Objective-C

To be written. Volunteers?

Code Block
grammar SimpleCalc; options { language=ObjC; } OR : '\|\|' ;

C

Code Block

grammar SimpleCalc;

options
{
    language=C;
}

tokens
{
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members
{

 #include "SimpleCalcLexer.h"

 int main(int argc, char * argv[])
 {

    pANTLR3_INPUT_STREAM        {   input;
    pSimpleCalcLexer            #include "SimpleCalcLexer.h"  lex;
    pANTLR3_COMMON_TOKEN_STREAM    tokens;
    pSimpleCalcParser     pANTLR3_INPUT_STREAM input = antlr3AsciiFileStreamNew(argv[1])         parser;

    input  = antlr3AsciiFileStreamNew          ((pANTLR3_UINT8)argv[1]);
   pSimpleCalcLexer lex    = SimpleCalcLexerNew(input);                (input);
    pANTLR3_COMMON_TOKEN_STREAM tokens = antlr3CommonTokenStreamSourceNew  (ANTLR3_SIZE_HINT, lex->pLexer->tokSourceTOKENSOURCE(lex));
    parser = SimpleCalcParserNew              pSimpleCalcParser parser = SimpleCalcParserNew(tokens);

    parser  ->expr(parser);

            parser->expr(parser);

                    // Must manually clean up
    //
               parserparser ->free(parser);
    tokens ->free(tokens);
    lex          tokens->free(tokenslex);
    input  ->close(input);

    return 0;
 }

}

 lex->free(lex);
                    input->close(input);

                    return 0;
               }
}

/*/*--------------------------------------------------------------------

 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term   ( ( PLUS | MINUS )  term   )*
        ;

term	: factor ( ( MULT | DIV   )  factor )*
        ;

factor	: NUMBER
        ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	    : (DIGIT)+
            ;

WHITESPACE  : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDEN; }'| '\u000C' )+
              {
                 $channel = HIDDEN;
              }
            ;

fragment
DIGIT	    : '0'..'9'
            ;

Python

Code Block

grammar SimpleCalc;

options {
	language = Python;
}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@header {
import sys
import traceback

from SimpleCalcLexer import SimpleCalcLexer
}

@main {
def main(argv, otherArg=None):
  char_stream = ANTLRFileStream(sys.argv[1])
  lexer = SimpleCalcLexer(char_stream)
  tokens = CommonTokenStream(lexer)
  parser = SimpleCalcParser(tokens);

  try:
        parser.expr()
  except RecognitionException:
	traceback.print_stack()
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = HIDDEN; } ;

fragment DIGIT	: '0'..'9' ;

...

Code Block
java org.antlr.Tool SimpleCalc.g

ANTLR will generate source files for the lexer and parser (e.g. SimpleCalcLexer.java and SimpleCalcParser.java). Copy these into the appropriate places for your development environment and compile them.

2.3 Revisit the simple grammar and learn basic ANTLR 3 syntax

...

First, we have to define white space:

A space is ' '
A tab is written '\t'
A newline (line feed) is written '\n'
A carriage return is written '\r'
A Form Feed has a decimal value of 12 and a hexidecimal value of $0C. ANTLR uses Unicode, so we define this as 4 hex digits: {{
u000C }} '\u000C'

Put these together with an "or", allow one or more to occur together, and you have

...

Code Block

title	Main entry point for Java

@members {
    public static void main(String[] args) throws Exception {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalcSimpleCalcParser parser = new SimpleCalcSimpleCalcParser(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            e.printStackTrace();
        }
    }
}

...

Construct	Description	Example
`(...)*`	Kleene closure - matches zero or more occurrences	`LETTER DIGIT*` - match a `LETTER` followed by zero or more occurrences of `DIGIT`
`(...)+`	Positive Kleene closure - matches one or more occurrences	`('0'..'9')+` - match one or more occurrences of a numerical digit `LETTER (LETTER\|DIGIT)+` - match a `LETTER` followed one or more occurrences of either `LETTER` or `DIGIT`
`fragment`	`fragment` in front of a lexer rule instructs ANTLR that the rule is only used as part of another lexer rule (i.e. it only builds a fragment of a recognized token)	`fragment` {{ DIGIT : '0'..'9' ; NUMBER : (DIGIT)+ ('.' (DIGIT)+ )? ;}}

Versions Compared

Old Version 50

New Version Current

Key

2. Run ANTLR 3 on a simple grammar

2.3 Revisit the simple grammar and learn basic ANTLR 3 syntax