Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ANTLR – ANother Tool for Language Recognition – is a tool that helps you to build and maintain language processing software systems more easily. It does this by generating - part of - the source code for inclusion in these systems from language specifications that you supply. In common terminology, ANTLR is a compiler generator or compiler compiler in the tradition of tools such as Lex/Flex and Yacc/Bison). ANTLR takes as it's input a grammar - which is a precise description of a language augmented with semantic actions - and generates a number of files at least one of which contains including source code in a files for the target language (e.g. Java, C/C++, C#, Python, Ruby) that is specified in the grammar.

Developers use ANTLR to implement Domain-Specific Languages, to write language compilers and translators, and even to parse complex XML.

As stated above, ANTLR 3 can generate the source code for various tools that can be used to analyze and transform input in the language defined by the input grammar. The basic types of language processing tools that ANTLR can generates are Lexers (a.k.a scanners, tokenizers), Parsers and TreeParsers (a.k.a tree walkers, c.f. visitors).

What exactly does ANTLR 3 do?

ANTLR reads a grammar language description file called a grammar and generates at least two a number of files for you. Most uses of ANTLR generates at least one (and quite often both) of these tools:

  • A Lexer: This reads an input character or byte stream (i.e. characters, binary data, etc.), divides it into tokens using patterns you specify, and generates a token stream as output. It can also hide information flag some tokens such as whitespace and comments from the next stage.A as hidden using a protocol that ANTLR parsers automatically understand and respect.
  • A Parser: This reads the a token stream (normally generated by a lexer), and matches it phrases in your language via the rules (patterns) you specify, and typically performs some semantic action for each rulephrase (or sub-phrase) matched. Each rule match could invoke a custom action, write some text via StringTemplate, or generate an Abstract Syntax Tree for additional processing.

ANTLR's Abstract Syntax Tree (AST) processing is especially powerful. If you also specify a tree grammar, ANTLR will generate a Tree Parser for you that can contain custom actions or StringTemplate output statements. The next version of ANTLR (3.1) will include rewriting rules to alter the tree into new formssupport rewrite rules that can be used to express tree transformations.

Most language tools will:

  1. Use a Lexer and Parser in series to check the word-level and phrase-level structure of the input and if no error are encountered, create an intermediate tree representation such as an Abstract Syntax Tree (AST),
  2. Modify (Optionally modify (i.e tranform or rewrite) the tree (e.g. to perform optimizations) using one or more Tree Parsers, and
  3. Use Produce the final output using a Tree Parser at the end to read process the final tree and either perform custom actions or write out the result via StringTemplatestructure. This might involve generating source code or other textual representation from the tree (perhaps using StringTemplate) or, performing some other custom actions driven by the tree structure.

Simpler language tools may omit the intermediate tree and build the actions or output stage directly into the parser. The calculator shown below uses only a Lexer and a Parser.

...

ANTLR 3 is the latest version of a language processing toolkit that was originally released as PCCTS in the mid-1990s. As was the case then, this release of the ANTLR toolkit advances the state of the art with it's new LL(star) parsing engine. ANTLR (ANother Tool for Language Recognition) provides a framework for the generation of recognizers, compilers, and translators from grammatical descriptions. ANTLR grammatical descriptions can optionally include action code written in what is termed the target language (i.e. the implementation language of the source code artifacts generated by ANTLR).

...