Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

The Python code generation target

Please note that the Python target is (compared to most other targets) rather young. I would consider it to be in beta state. This means that most parts are working (big exception are template output), but bugs and problems are to be expected and documentation is pretty poor. It still has to prove itself in a real world application (which is currently being done).

Both the runtime module and the code generation templates should now be feature complete and in sync with the Java target, except for the features listed below. But large parts of the runtime are still untested.

See this example for a working tree-walking grammer.

Please send bugreports, feedback, patches to me or the antlr-interest mailing list

Credits go to Clinton Roy for the code to support output=AST.

Requirements

The following Python versions are supported: 2.4 2.5
(The runtime package and the generated code are probably compatible with Python 2.3, but I started to use decorators in the testsuite and a recent apt-get dist-upgrade purged python2.3 from my system, so I cannot test this at the moment.)

To use generated code, you'll need the Python runtime package antlr3 in your import path. There are no other dependencies beyond the Python standard library.

Usage

Selecting Python output

Just add language=Python; to the options section of your grammar:

grammar T;
options {
    language=Python;
    [other options]
}

...

For a grammar T.g ANTLR3 will then create the files TLexer.py and TParser.py which contain the classes TLexer and TParser (or just one of those, if you have a pure lexer/parser). For tree parsers, ANTLR3 creates T.py containing the class T.

Using the generated classes

To use a grammar T.g:

import antlr3
from TLexer import TLexer
from TParser import TParser

input = '...what you want to feed into the parser...'
char_stream = antlr3.ANTLRStringStream(input)
# or to parse a file:
# char_stream = antlr3.ANTLRFileStream(path_to_input)
# or to parse an opened file or any other file-like object:
# char_stream = antlr3.ANTLRInputStream(file)

lexer = TLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = TParser(tokens)
parser.entry_rule()

If you want to access the tokens types in your code, you'll have to import these from the lexer or parser module (in Java these are members of the lexer/parser classes, in Python they are defined on module level):

from TLexer import EOF, INTEGER, FLOAT, IDENTIFIER

Using tree parsers

For grammars T.g (parser and lexer) and TWalker.g (the tree parser):

import antlr3
import antlr3.tree
from TLexer import TLexer
from TParser import TParser
from TWalker import TWalker

char_stream = antlr3.ANTLRStringStream(...)
lexer = TLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = TParser(tokens)
r = parser.entry_rule()

# this is the root of the AST
root = r.tree

nodes = antlr3.tree.CommonTreeNodeStream(root)
nodes.setTokenStream(tokens)
walker = TWalker(nodes)
walker.entry_rule()

API documentation

Reference documentation for the runtime package can be found at http://www.antlr.org/api/Python/.

Actions

This target currently supports the action scopes @lexer, @parser and @treeparser for global actions. The following action names are known:

  • header - Will be inserted right after ANTLRs own imports at the top of the generated file. Use it for import statements or any other functions/classes which you need in the module scope.
  • init - Will be inserted at the end of the _init_ method of the lexer/parser. Here you can setup your own instance attributes.
  • members - Will be inserted in the class body of the lexer/parser right after _init_. This is the right place for custom methods and class attributes.

For rules the additional action @decorate is recognized. The contents are placed right before the rule method and can be used for decorators.

r
@decorate { @logThis }
: s ;

will create something like

@logThis
def r(self):
    ...

Caveats

Don't use TABs

Make sure that your editor is using spaces for indention, when you are editing your grammar files. The generated code uses spaces and when your actions are copied into the output, TABs will only cause confusion. A warning should be generated, when ANTLR stumples upon TABs.

% in actions

This is not a Python specific issue, but the % operator is probably used more often in Python than in other languages. See Special symbols in actions for the usage of %, but support for these is not you implemented for the Python target.

In an ANTLR3 grammars % is a special character for StringTemplate and must be escaped, in order to pass a plain % into the generate code. So you'll have to stuff like

...
{ print "hello \%s" \% $t.text }
...

Semicolons after property assignments

If you are assigning a value to a property in an action, it may be required to add a semicolon after the statement.

...
{ $text = "Hello world!"; }  // set text for rule
...
{ $someScope::someMember = value; } // set a scope member
...

ANTLR currently scans the code for a semicolon to detect property assignments. This semicolon is omitted in the generated code. This may lead to some strange code corruption, if ANTLR finds a semicolon in an unexpected place. But I don't know, if this is more than a theoretical problem - sofar I have not run into any issues.

For the curious...

Technically ANTLR does not need to treat assignments differently from expressions, because in Python stuff like $text always translates to some_internal_name.text, whereas Java need setText(...), if it's a LHS, and getText() on the RHS.
So this issue may well be fixed in a later version of ANTLR. In that case, the semicolons that you may now be using, will make it into the generated code, which is fortunately not a syntax error.

Empty alternatives

In rules with empty alternatives, ANTLR may generate invalid Python code:

r: ( s | t | ) u;

This will result in an else: without an indented block. Until this issue is resolved, just stick a no-op action into the empty alternative:

r: ( s | t | {pass} ) u;

Comments (for ANTLR2 users)

The Python target for ANTLR2 forced you to use C++ style comments inside of action blocks in your grammar. This is now longer true, use plain Python comments.

Unsupported features

  • output=template: The code generation template looks pretty simple, but I don't know, if this makes sense at all, as long as there is no StringTemplate V3 for Python - that is, if it would work with the current Python ST2.
  • -debug option: mostly useful for integration into ANTLRWorks. Will have to check out, if this is feasable at all.
  • ... (I still have to work my way through The Book - perhaps I'll stumble upon more stuff that I have not yet considered)
  • No labels