Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Five minute introduction to ANTLR 3

What is ANTLR 3?

ANTLR – ANother Tool for Language REcognition ANother Tool for Language Recognition – is a tool that helps you write to build and maintain language processing tools. It's commonly categorised as software systems more easily. It does this by generating - part of - the source code for these systems from language specifications that you supply. In common terminology, ANTLR is a compiler generator or compiler compiler in the tradition of tools such as Lex/Flex and Yacc/Bison). ANTLR takes as it's input a grammar description (which may define both a language and how to process the language's constructs) and emits multiple files in your chosen - which is a precise description of a language augmented with semantic actions - and generates a number of files at least one of which contains source code in a target language (e.g. Java, C/C++, C#, Python, Ruby...)) that is specified in the grammar.

Developers use ANTLR to implement Domain-Specific Languages, to write language compilers and translators, and even to parse complex XML.

...

ANTLR 3 is the latest version of a language processing toolkit that was originally released as PCCTS in the mid-1990s. As was the case then, this release of the ANTLR toolkit advances the state of the art with it's new LL(*) (star) parsing engine. ANTLR (ANother Tool for Language Recognition) provides a framework for the generation of recognizers, compilers, and translators from grammatical descriptions. ANTLR grammatical descriptions can optionally include action code written in what is termed the target language (i.e. the implementation language of the source code artifacts generated by ANTLR).

...

  • NUMBER defines a token (named "NUMBER") that contains any character between 0 and 9, inclusive, repeated one or more times. .. creates a character range, while { + } means "one or more times". (This suffix should look familiar if you know regular expressions.)
  • PLUS defines a token with a single character: { +}.
  • add defines a parser rule that says "expect a NUMBER token, a PLUS token, and a NUMBER token in that order." Any other tokens, or tokens in a different order, will trigger an error message.

...