Antlr3PerlTarget

Perl ANTLR v3 Target

Author

Ron Blaschke, ron at rblasch.org

Status

Early prototyping phase.  Simple lexer and parser are working.

Progress

Here's a simple example.  Note that everything is still subject to change.

$ cat T.g
lexer grammar T;
options { language = Perl5; }
ZERO: '0';
ONE: '1';
$ cat T.tokens
Tokens=6
ZERO=4
ONE=5
$ cat t.pl
#!/usr/bin/perl

use ANTLR::Runtime::ANTLRStringStream;
use TLexer;

use strict;
use warnings;

my $input = ANTLR::Runtime::ANTLRStringStream->new('010');
my $lexer = TLexer->new($input);

while (1) {
    my $token = $lexer->next_token();
    last if $token->get_type() == $TLexer::EOF;

    print "type: ", $token->get_type(), "\n";
    print "text: ", $token->get_text(), "\n";
    print "\n";
}
$ perl t.pl
type: 4
text: 0

type: 5
text: 1

type: 4
text: 0

2007-06-13

+ Escaped characters, like '\n', are now handled properly.

+ Added  error handling.

lexer grammar T2;
options { language = Perl5; }

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ ;
INT=5
WS=7
Tokens=8
ID=4
NEWLINE=6
#!/usr/bin/perl

use ANTLR::Runtime::ANTLRStringStream;
use T2Lexer;

use strict;
use warnings;

my $input = ANTLR::Runtime::ANTLRStringStream->new("Hello World!\n42\n");
my $lexer = T2Lexer->new($input);

while (1) {
    my $token = $lexer->next_token();
    last if $token->get_type() == $T2Lexer::EOF;

    print "type: ", $token->get_type(), "\n";
    print "text: ", $token->get_text(), "\n";
    print "\n";
}
type: 4
text: Hello

type: 7
text:

type: 4
text: World

line 1:12 no viable alternative at character '!'
type: 6
text:


type: 5
text: 42

type: 6
text:

Note the "no viable alternative" error message for the unrecognized '!'.

 2007-06-15

+  Handle lexer actions

Here's another  short example, similar to the one above.  Note how whitespaces are put into the hidden channel (99) and newlines are skipped.

lexer grammar T2;
options { language = Perl5; }

ID  :   ('a'..'z'\|'A'..'Z')\+ ;
INT :   '0'..'9'\+ ;
NEWLINE:'\r'? '\n' { $self->skip(); } ;
WS  :   (' '\|'\t')\+ { $channel = HIDDEN; } ;
$ perl t.pl
text: Hello
type: 4
pos: 1:0
channel: 0
token index: -1

text:
type: 7
pos: 1:5
channel: 99
token index: -1

text: World
type: 4
pos: 1:6
channel: 0
token index: -1

line 1:11 no viable alternative at character '!'
text: 42
type: 5
pos: 2:0
channel: 0
token index: -1

2007-06-26

+ Simple Parser is working

Quick, what is 2 + 2?   If you can't remember here's an easy way to find out.  First we need a grammar.

grammar MExpr;

options {
  language = Perl5;
}

prog:   stat+ ;

stat:   expr NEWLINE { print "$expr.value\n"; }
    |   NEWLINE
    ;

expr returns [value]
    :   e=atom { $value = $e.value; }
        (   '+' e=atom { $value += $e.value; }
        |   '-' e=atom { $value -= $e.value; }
        )*
    ;

atom returns [value]
    :   INT { $value = $INT.text; }
    |   '(' expr ')' { $value = $expr.value; }
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ { $self->skip(); } ;

 And here's the test program.

#!/usr/bin/perl

use strict;
use warnings;

use ANTLR::Runtime::ANTLRStringStream;
use ANTLR::Runtime::CommonTokenStream;
use MExprLexer;
use MExprParser;

while (<>) {
    my $input = ANTLR::Runtime::ANTLRStringStream->new($_);
    my $lexer = MExprLexer->new($input);

    my $tokens = ANTLR::Runtime::CommonTokenStream->new({ token_source => $lexer });
    my $parser = MExprParser->new($tokens);
    $parser->prog();
}

 Finally we're getting to the answer.

$ perl t.pl
2 + 2
4

2007-08-08

 + Simple expression grammar

The grammar

grammar Expr;

options {
    language = Perl5;
}

@header {
}

@members {
    my %memory;
}

prog:   stat+ ;

stat:   expr NEWLINE { print "$expr.value\n"; }
    |   ID '=' expr NEWLINE
        { $memory{$ID.text} = $expr.value; }
    |   NEWLINE
    ;

expr returns [value]
    :   e=multExpr { $value = $e.value; }
        (   '+' e=multExpr { $value += $e.value; }
        |   '-' e=multExpr { $value -= $e.value; }
        )*
    ;

multExpr returns [value]
    :   e=atom { $value = $e.value; } ('*' e=atom { $value *= $e.value; })*
    ;

atom returns [value]
    :   INT { $value = $INT.text; }
    |   ID
        {
            my $v = $memory{$ID.text};
            if (defined $v) {
                $value = $v;
            } else {
                print STDERR "undefined variable $ID.text\n";
            }
        }
    |   '(' expr ')' { $value = $expr.value; }
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ { $self->skip(); } ;

 Test program

#!/usr/bin/perl

use strict;
use warnings;

use blib '../..';

use ANTLR::Runtime::ANTLRStringStream;
use ANTLR::Runtime::CommonTokenStream;
use ExprLexer;
use ExprParser;

my $in;
{
    undef $/;
    $in = <>;
}

my $input = ANTLR::Runtime::ANTLRStringStream->new($in);
my $lexer = ExprLexer->new($input);

my $tokens = ANTLR::Runtime::CommonTokenStream->new({ token_source => $lexer });
my $parser = ExprParser->new($tokens);
$parser->prog();

Test run

$ perl t.pl
x=1
y=2
3*(x+y)
^Z
9

2008-02-23

 Started real porting effort.  The goal is to port one ANTLR runtime class at a time from Java to Perl, including full API coverage and documentation.  First stop of the porting train: ANTLR::Runtime::BitSet.

2008-11-18

Got the first parser working: SimpleCalc, taken from the Five minute introduction to ANTLR 3.

Author

Ronald Blaschke (ron at rblasch org)