Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ANTLR 3 is the latest version of a language processing toolkit that was originally released as PCCTS in the mid-1990s. As was the case then, this release of the ANTLR toolkit advances the state of the art with it's new LL(*) (star) parsing engine. ANTLR (ANother Tool for Language Recognition) provides a framework for the generation of recognizers, compilers, and translators from grammatical descriptions. ANTLR grammatical descriptions can optionally include action code written in what is termed the target language (i.e. the implementation language of the source code artifacts generated by ANTLR).

When it was released, PCCTS supported C as it's only target language, but through consulting with NeXT Computer, PCCTS supported C++ after 1994. It's Its immediate successor ANTLR 2 supported Java, C# and Python in addition to C++. Although it is still in beta, ANTLR 3 has already demonstrated support for Java, C#, Objective-C, C, C++ and Ruby as target languages. As of July 2006, the Java target is complete and the C#, Objective C, Ruby and C targets are nearly complete. Support for additional target languages including C++, Perl6 and Oberon (yes, Oberon) is either expected or already in progress.

...

Because it can save you time and resources by automating significant portions of the effort involved in building language processing tools. It is well established that generative tools such as compiler compilers have a major, positive impact on developer productivity. In addition, ANTLR v3's improved analysis engine, it's its significantly enhanced parsing strength via LL(*) (star) parsing with arbitrary lookahead, it's its vastly improved tree construction rewrite rules and the availability of the simply fantastic AntlrWorks IDE offers productivity benefits over other comparable generative language processing toolkits.

...

Java

Code Block
grammar SimpleCalc;

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members {
    public static void main(String[] args) throws Exception {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalc parser = new SimpleCalc(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            e.printStackTrace();
        }
    }
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;
	
term	: factor ( ( MULT | DIV ) factor )* ;
	
factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;
	
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ channel = 99; } ;
	
fragment DIGIT	: '0'..'9' ;

C#

Code Block
grammar SimpleCalc;

options {
    language=CSharp;
}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@members {
    public static void Main(string[] args) {
        SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0]));
       	CommonTokenStream tokens = new CommonTokenStream(lex);

        SimpleCalc parser = new SimpleCalc(tokens);

        try {
            parser.expr();
        } catch (RecognitionException e)  {
            Console.Error.WriteLine(e.StackTrace);
        }
    }
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

expr	: term ( ( PLUS | MINUS )  term )* ;
	
term	: factor ( ( MULT | DIV ) factor )* ;
	
factor	: NUMBER ;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

NUMBER	: (DIGIT)+ ;
	
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ channel = 99; } ;
	
fragment DIGIT	: '0'..'9' ;

Objective-C

Code Block
grammar Simple;

options 
{
    language=ObjC;
}

OR : '||' ;

C

Code Block
grammar Simple;

options 
{
    language=C;
}

OR : '||' ;

...

Construct

Description

Example

(...)*

Kleene closure - matches zero or more occurrences

LETTER DIGIT* - match a LETTER followed by zero or more occurrences of DIGIT

(...)+

Positive Kleene closure - matches one or more occurrences

('0'..'9')+ - match one or more occurrences of a numerical digit
LETTER (LETTER|DIGIT)+ - match a LETTER followed one or more occurrences of either LETTER or DIGIT

fragment

fragment in front of a lexer rule tells antlr this TOKEN is part of another LEXER RULE

Code Block

fragment DIGIT	: '0'..'9' ; 
NUMBER	: (DIGIT)+ ;

How about a more complex ANTLR 3 grammar?

...