Antlr3PerlTarget

Antlr3PerlTarget

Perl ANTLR v3 Target

Author

Ron Blaschke, ron at rblasch.org

Status

Early prototyping phase.  Simple lexer and parser are working.

Progress

Here's a simple example.  Note that everything is still subject to change.

$ cat T.g lexer grammar T; options { language = Perl5; } ZERO: '0'; ONE: '1';
$ cat T.tokens Tokens=6 ZERO=4 ONE=5
$ cat t.pl #!/usr/bin/perl use ANTLR::Runtime::ANTLRStringStream; use TLexer; use strict; use warnings; my $input = ANTLR::Runtime::ANTLRStringStream->new('010'); my $lexer = TLexer->new($input); while (1) { my $token = $lexer->next_token(); last if $token->get_type() == $TLexer::EOF; print "type: ", $token->get_type(), "\n"; print "text: ", $token->get_text(), "\n"; print "\n"; }
$ perl t.pl type: 4 text: 0 type: 5 text: 1 type: 4 text: 0

2007-06-13

+ Escaped characters, like '\n', are now handled properly.

+ Added  error handling.

lexer grammar T2; options { language = Perl5; } ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ ;
INT=5 WS=7 Tokens=8 ID=4 NEWLINE=6
#!/usr/bin/perl use ANTLR::Runtime::ANTLRStringStream; use T2Lexer; use strict; use warnings; my $input = ANTLR::Runtime::ANTLRStringStream->new("Hello World!\n42\n"); my $lexer = T2Lexer->new($input); while (1) { my $token = $lexer->next_token(); last if $token->get_type() == $T2Lexer::EOF; print "type: ", $token->get_type(), "\n"; print "text: ", $token->get_text(), "\n"; print "\n"; }
type: 4 text: Hello type: 7 text: type: 4 text: World line 1:12 no viable alternative at character '!' type: 6 text: type: 5 text: 42 type: 6 text:

Note the "no viable alternative" error message for the unrecognized '!'.

 2007-06-15

+  Handle lexer actions

Here's another  short example, similar to the one above.  Note how whitespaces are put into the hidden channel (99) and newlines are skipped.

lexer grammar T2; options { language = Perl5; } ID  :   ('a'..'z'\|'A'..'Z')\+ ; INT :   '0'..'9'\+ ; NEWLINE:'\r'? '\n' { $self->skip(); } ; WS  :   (' '\|'\t')\+ { $channel = HIDDEN; } ;
$ perl t.pl text: Hello type: 4 pos: 1:0 channel: 0 token index: -1 text: type: 7 pos: 1:5 channel: 99 token index: -1 text: World type: 4 pos: 1:6 channel: 0 token index: -1 line 1:11 no viable alternative at character '!' text: 42 type: 5 pos: 2:0 channel: 0 token index: -1

2007-06-26

+ Simple Parser is working

Quick, what is 2 + 2?   If you can't remember here's an easy way to find out.  First we need a grammar.

grammar MExpr; options { language = Perl5; } prog: stat+ ; stat: expr NEWLINE { print "$expr.value\n"; } | NEWLINE ; expr returns [value] : e=atom { $value = $e.value; } ( '+' e=atom { $value += $e.value; } | '-' e=atom { $value -= $e.value; } )* ; atom returns [value] : INT { $value = $INT.text; } | '(' expr ')' { $value = $expr.value; } ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ { $self->skip(); } ;

 And here's the test program.

#!/usr/bin/perl use strict; use warnings; use ANTLR::Runtime::ANTLRStringStream; use ANTLR::Runtime::CommonTokenStream; use MExprLexer; use MExprParser; while (<>) { my $input = ANTLR::Runtime::ANTLRStringStream->new($_); my $lexer = MExprLexer->new($input); my $tokens = ANTLR::Runtime::CommonTokenStream->new({ token_source => $lexer }); my $parser = MExprParser->new($tokens); $parser->prog(); }

 Finally we're getting to the answer.

$ perl t.pl 2 + 2 4

2007-08-08

 + Simple expression grammar

The grammar

grammar Expr; options { language = Perl5; } @header { } @members { my %memory; } prog: stat+ ; stat: expr NEWLINE { print "$expr.value\n"; } | ID '=' expr NEWLINE { $memory{$ID.text} = $expr.value; } | NEWLINE ; expr returns [value] : e=multExpr { $value = $e.value; } ( '+' e=multExpr { $value += $e.value; } | '-' e=multExpr { $value -= $e.value; } )* ; multExpr returns [value] : e=atom { $value = $e.value; } ('*' e=atom { $value *= $e.value; })* ; atom returns [value] : INT { $value = $INT.text; } | ID { my $v = $memory{$ID.text}; if (defined $v) { $value = $v; } else { print STDERR "undefined variable $ID.text\n"; } } | '(' expr ')' { $value = $expr.value; } ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ { $self->skip(); } ;

 Test program

#!/usr/bin/perl use strict; use warnings; use blib '../..'; use ANTLR::Runtime::ANTLRStringStream; use ANTLR::Runtime::CommonTokenStream; use ExprLexer; use ExprParser; my $in; { undef $/; $in = <>; } my $input = ANTLR::Runtime::ANTLRStringStream->new($in); my $lexer = ExprLexer->new($input); my $tokens = ANTLR::Runtime::CommonTokenStream->new({ token_source => $lexer }); my $parser = ExprParser->new($tokens); $parser->prog();

Test run

$ perl t.pl x=1 y=2 3*(x+y) ^Z 9

2008-02-23

 Started real porting effort.  The goal is to port one ANTLR runtime class at a time from Java to Perl, including full API coverage and documentation.  First stop of the porting train: ANTLR::Runtime::BitSet.

2008-11-18

Got the first parser working: SimpleCalc, taken from the Five minute introduction to ANTLR 3.

Author

Ronald Blaschke (ron at rblasch org)