Perl ANTLR v3 Target

Status

Early prototyping phase. Simple lexer and parser are working.

Progress

Here's a simple example. Note that everything is still subject to change.

$ cat T.g
lexer grammar T;
options { language = Perl5; }
ZERO: '0';
ONE: '1';

$ cat T.tokens
Tokens=6
ZERO=4
ONE=5

$ cat t.pl
#!/usr/bin/perl

use ANTLR::Runtime::ANTLRStringStream;
use TLexer;

use strict;
use warnings;

my $input = ANTLR::Runtime::ANTLRStringStream->new('010');
my $lexer = TLexer->new($input);

while (1) {
    my $token = $lexer->next_token();
    last if $token->get_type() == $TLexer::EOF;

    print "type: ", $token->get_type(), "\n";
    print "text: ", $token->get_text(), "\n";
    print "\n";
}

$ perl t.pl
type: 4
text: 0

type: 5
text: 1

type: 4
text: 0

2007-06-13

+ Escaped characters, like '\n', are now handled properly.

+ Added error handling.

lexer grammar T2;
options { language = Perl5; }

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ ;

INT=5
WS=7
Tokens=8
ID=4
NEWLINE=6

#!/usr/bin/perl

use ANTLR::Runtime::ANTLRStringStream;
use T2Lexer;

use strict;
use warnings;

my $input = ANTLR::Runtime::ANTLRStringStream->new("Hello World!\n42\n");
my $lexer = T2Lexer->new($input);

while (1) {
    my $token = $lexer->next_token();
    last if $token->get_type() == $T2Lexer::EOF;

    print "type: ", $token->get_type(), "\n";
    print "text: ", $token->get_text(), "\n";
    print "\n";
}

type: 4
text: Hello

type: 7
text:

type: 4
text: World

line 1:12 no viable alternative at character '!'
type: 6
text:


type: 5
text: 42

type: 6
text:

Note the "no viable alternative" error message for the unrecognized '!'.

2007-06-15

+ Handle lexer actions

Here's another short example, similar to the one above. Note how whitespaces are put into the hidden channel (99) and newlines are skipped.

lexer grammar T2;
options { language = Perl5; }

ID&nbsp; :&nbsp;&nbsp; ('a'..'z'\|'A'..'Z')\+ ;
INT :&nbsp;&nbsp; '0'..'9'\+ ;
NEWLINE:'\r'? '\n' { $self->skip(); } ;
WS&nbsp; :&nbsp;&nbsp; (' '\|'\t')\+ { $channel = HIDDEN; } ;

$ perl t.pl
text: Hello
type: 4
pos: 1:0
channel: 0
token index: -1

text:
type: 7
pos: 1:5
channel: 99
token index: -1

text: World
type: 4
pos: 1:6
channel: 0
token index: -1

line 1:11 no viable alternative at character '!'
text: 42
type: 5
pos: 2:0
channel: 0
token index: -1

2007-06-26

+ Simple Parser is working

Quick, what is 2 + 2? If you can't remember here's an easy way to find out. First we need a grammar.

grammar MExpr;

options {
  language = Perl5;
}

prog:   stat+ ;

stat:   expr NEWLINE { print "$expr.value\n"; }
    |   NEWLINE
    ;

expr returns [value]
    :   e=atom { $value = $e.value; }
        (   '+' e=atom { $value += $e.value; }
        |   '-' e=atom { $value -= $e.value; }
        )*
    ;

atom returns [value]
    :   INT { $value = $INT.text; }
    |   '(' expr ')' { $value = $expr.value; }
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ { $self->skip(); } ;

And here's the test program.

#!/usr/bin/perl

use strict;
use warnings;

use ANTLR::Runtime::ANTLRStringStream;
use ANTLR::Runtime::CommonTokenStream;
use MExprLexer;
use MExprParser;

while (<>) {
    my $input = ANTLR::Runtime::ANTLRStringStream->new($_);
    my $lexer = MExprLexer->new($input);

    my $tokens = ANTLR::Runtime::CommonTokenStream->new({ token_source => $lexer });
    my $parser = MExprParser->new($tokens);
    $parser->prog();
}

Finally we're getting to the answer.

$ perl t.pl
2 + 2
4

Author

Ronald Blaschke (ron at rblasch org)

Antlr3PerlTarget