gUnit - Grammar Unit Testing

Introduction

gUnit is a "Unit Test" framework for ANTLR grammars. It provides a simple way to write and run automated tests for ANTLR grammars in a manner similar to Java unit testing framework jUnit. The basic idea is to create a bunch of input/output pairs for rules in a grammar and gUnit will verify the expected output/result. The input can be a single line or multiple lines of strings or even an external file. The output can be simply success or failure, an abstract syntax tree (AST), a rule return value, or some text output which could be a rule's template return value. The current version of gUnit has 2 main functions, interpreter and jUnit generator. The interpreter interprets your gUnit script and runs unit tests using Java reflection to invoke methods in your parser objects. The generator, on the other hand, translates your gUnit script to jUnit Java code that you can compile and execute by hand.

Getting gUnit

As of ANTLR v3.1, gUnit is included in the main ANTLR Tool jar, but the very latest snapshot of gUnit can always be found on the ANTLR Hudson Continuous Build System. Here you will find, source and binary jars in case you need to override the version that ships within ANTLR 3.1 and later releases.

Note: If you are using ANTLR 3.0.1 or 3.0, select the "attachments" tab above and download the jar file and please download gunit-1.0.1.jar.

Example unit tests

Let's use some simple but useful examples for trying the grammar unit testing. In the first example, consider how to test the grammar of LL-star from the examples tarball you can get at http://www.antlr.org/download/examples-v3.tar.gz. Before running the grammr unit testing, don't forget to run ANTLR on the grammar and compile properly. In following command, we are considering ANTLR3 jar lies in your classpath.

$ cd examples/java/LL-star
$ java org.antlr.Tool SimpleC.g
$ javac *.java

Now, we'll start our first gUnit script. You may give your gUnit script any filename, but here let's use an unified extension .gunit as a convention. Each gUnit testsuite is for testing only one grammar, but it can test any rule in the grammar with multiple unit tests just as how you test your Java functions with jUnit. In a testsuite, each rule may contain multiple tests, and each test is a pair of input and output(expected result). The basic syntax of a gUnit test is input -> output which is different from the ANTLR rewrite rule. It simply means that giving an input source to an ANTLR rule may cause an expected result. For testing whether an input source could pass a rule or not, you just need to add OK or FAIL after the input source. If you would like to learn more about the gUnit syntax, you could jump to the last section: gUnit grammar first.
Here is the script SimpleC.gunit which is used to test grammar SimpleC.g in LL-star:

gunit SimpleC;

//test rule:variable with 2 tests
variable:
"int x" FAIL     //expect failure, because of missing ';' in the input string

"int x;" -> OK

//test rule:functionHeader with 1 test
functionHeader:
"void bar(int x)" returns ["int"]  //expect a return string "int" from rule

//test rule:program with 3 tests, input starts immediately after the initial << so the first test is a blank line
program:
<<
char c;
int x;
>> OK        //expect success (no error messages from ANTLR)

input OK     //expect success

input -> ""  //expect standard output "" from rule


// test lexical rules
ID:
"abc123" OK    //expect success
"XYZ@999" OK   //expect success
"123abc" FAIL  //expect failure

INT:
"00000" OK
"123456789" OK

Please note that the text input in above script represents an external file input source because there are no quotes or double angle brackets surrounded. Here is the file input

:

char c;
int x;
void bar(int x);
int foo(int y, char d) {
  int i;
  for (i=0; i<3; i=i+1) {
    x=3;
    y=5;
  }
}

Run the interpreter and you will see unit test result as follows:

$ java org.antlr.gunit.Interp SimpleC.gunit
--------------------------------------------------------------------------------
executing testsuite for grammar:SimpleC with 11 tests
--------------------------------------------------------------------------------
3 failures found:
test3 (functionHeader, line11) - 
expected: int
actual: bar

test6 (program, line22) - 
expected: 
actual: bar is a declaration\nfoo is a definition\n

test8 (ID, line28) - 
expected: OK
actual: extra text found, '@999'

Tests run: 11, Failures: 3

In test2, we miss a character ';' so the test will fail and also provide us the error message from ANTLR instead of our expected string "x". In test3, we expect to get a return value "int" but actually get "bar". In test6, we will get some output text instead of the expected empty string. In test8, there is an undefined symbol '@' in the input string, so the test will not pass.

What if you want to build unit tests for tree grammar instead? Let's test different grammars in simplecTreeParser from examples-v3.tar.gz. First of all, we start from writing a gUnit script to test AST construction rules in SimpleC.g in simplecTreeParser. Don't forget to re-run org.antlr.Tool on SimpleC.g which is different from the previous one.

gunit SimpleC;

/** only test AST output in this testsuite */
variable:
"int x;" -> (VAR_DEF int x)       //test1


declaration:
"void bar(int);" -> ()            //test2

"int foo(int y, char d) {}" -> () //test3


program:
input -> ()                       //test4

Be sure that both parser and lexer have been properly generated by ANTLR and compiled before running the tests.

$ cd examples/java/simplecTreeParser
$ java org.antlr.Tool SimpleC.g SimpleCWalker.g
$ javac *.java
$ java org.antlr.gunit.Interp SimpleC.gunit
--------------------------------------------------------------------------------
executing testsuite for grammar:SimpleC with 4 tests
--------------------------------------------------------------------------------
3 failures found:
test2 (declaration, line9) -
expected: ()
actual: line 1:12 no viable alternative at input ')'

test3 (declaration, line11) -
expected: ()
actual: (FUNC_DEF (FUNC_HDR int foo (ARG_DEF int y) (ARG_DEF char d)) BLOCK)

test4 (program, line15) -
expected: ()
actual: (VAR_DEF char c) (FUNC_DECL (FUNC_HDR void bar (ARG_DEF int x)))
 (FUNC_DEF (FUNC_HDR int foo (ARG_DEF int y) (ARG_DEF char d))
(BLOCK (VAR_DEF int i) (for (= i 0) (< i 3) (= i (+ i 1)) (BLOCK (= y 5)))))

Tests run: 4, Failures: 3

There will be 3 failures, because the actual abstract syntax trees created from rules are different from the expected tree structures. In test2, we will get an error message because missing a declarator after type int.

After testing the parser grammar, let's start to test tree grammar. The gUnit notation for testing a tree grammar is identical to the notation for testing a regular grammar, except a keyword 'walks'. You'll need the keyword 'walks' to tell gUnit that which tree grammar walks which parser grammar, and which tree grammar rule walks which parser grammar rule (tree rules walk trees built by parser rules).
Here is SimpleCWalker.gunit for testing the tree grammar SimpleCWalker.g in simplecTreeParser:

gunit SimpleCWalker walks SimpleC;

program walks program:
input OK        //test1
"int x" -> ""   //test2


variable walks program:
"char c" -> ""  //test3


declaration walks program:
"void bar(int x)" FAIL  //test4

Run the unit tests and check the results from gUnit as follows:

$ java org.antlr.gunit.Interp SimpleCWalker.gunit
-----------------------------------------------------------------------------------
executing testsuite for tree grammar:SimpleCWalker walks SimpleCParser with 4 tests
-----------------------------------------------------------------------------------
2 failures found:
test2 (program walks program, line5) -
expected:
actual: line 0:-1 no viable alternative at input '<EOF>'
SimpleCWalker.g: node from line 0:0 required (...)+ loop
did not match anything at input 'int'

test3 (variable walks program, line9) -
expected:
actual: line 0:-1 no viable alternative at input '<EOF>'
SimpleCWalker.g: node from line 0:0 mismatched tree node:
 <unexpected: [@-1,0:0='<no text>',<-1>,0:-1], resync=char> expecting VAR_DEF

Tests run: 4, Failures: 2

There will be 2 failures from test2 and test3, because we miss a character ";" in both input string. Therefore, we will receive error messages from ANTLR instead of the expected empty string.

Test StringTemplate output

If there's a grammar with output=template setting, how to test a template return value from a rule? Simply enclose the expected template return value in double quotes just as the way for testing expected standard output string stated above.

Test lexical rules

gUnit allows you test lexical rules as well as parser rules. The order of tests are not restricted. You could define tests for lexical rules before tests for parser rules, or even mix them. Please note that you can only test an input string is OK or FAIL on a lexical rule currently as the above example illustrates.

Using package

Need to test a grammar using package definition? No problem, You just need to add @header{package ...;} into your gUnit script. Assuming that the grammar SimpleC.g from the above example has a package definition: examples.java, its gUnit testsuite should look like below:

gunit SimpleC;
@header{package examples.java;}
// Note that only package definition allowed in gUnit header action currently
rule testsuites

Using gunit with separate parser and lexer files

If you separate your parser and lecture, such as FooParser.g and FooLexer.g, then use Foo.gunit and inside use header:

gunit Foo;
...

Testing with custom trees

gUnit also allows you to use your custom tree adaptor with the setting option{TreeAdaptor=...;} in the gUnit script. The options setting should be placed before the @header setting.

options {
TreeAdaptor = YourTreeAdaptorName;    // you can also use package.YourTreeAdaptorName if necessary
}

Restriction of the TreeAdaptor option:
Currently gUnit can't handle a real arbitrary tree adaptor. It only accepts a tree adaptor with its default constructor without any parameters.

Translate to jUnit

What if you want to test a rule with multiple return values? We need to translate our gUnit script to jUnit code. Assuming that we have a rule as follows:

t returns [int x, String y, boolean z]:...;

To retrieve and test multiple return values, we'll need a predefined variable name retval which is only valid within {boolean statements} in a gUnit script. It is only supported under jUnit generator mode currently. Here is a simple test for the rule t:

...
rule t:
input-1 -> {retval.x==0}
input-2 -> {retval.y.equals("just a test")}
input-3 -> {retval.z!=true}
...

Run gUnit generator mode, then the script above will be translated to:

$ java org.antlr.gunit.Interp -o grammar-name.gunit
// For better readability, jUnit codes are simplified here
...
public void testT1() throws Exception {
   grammar-name.t_return retval = (grammar-name.t_return)execParser("t", input-1);
   assertTrue(retval.X==0);
}
public void testT2() throws Exception {
   grammar-name.t_return retval = (grammar-name.t_return)execParser("t", input-2);
   assertTrue(retval.y.equals("just a test"));
}
public void testT3() throws Exception {
   grammar-name.t_return retval = (grammar-name.t_return)execParser("t", input-3);
   assertTrue(retval.z!=true);
}
...

JDK compatibility

It is important to keep in mind that gUnit generates a Java 5 compatible code. This creates problems if you intend to use generated unit tests with a 1.4 compatible application. Problems appearing are :

  • Enhanced for loops;
  • Variable argument lists.
Enhanced for loops workaround

Replace

for (Method method : methods) {
...
}

with

Method method = null;
for(int methodIndex=0; methodIndex < methods.length; methodIndex++) {
    method = methods[methodIndex];
    ...
}
Variable argument lists workaround

The problem occurs for reflection methods getMethod and invoke. Those methods take an array of parameters at the end, and are replaced in Java 5 by a variable argument list. gUnit generator takes advantage of it and just omits the array of parameters. As a consequence, calls just have one argument, instead of two necessary in Java 1.4.

So, to correct your class, just find every call to those methods, and add a final parameter set to null.

Example : Replace

Method returnName = _return.getMethod("getTree");

with

Method returnName = _return.getMethod("getTree", null);

Command line options

To use gUnit, you need to add antlr-3.0.jar, stringtemplate-3.0.jar and gunit.jar to your CLASSPATH.
Usage:

java org.antlr.gunit.Interp [args] file.gunit

Option

Description

-o

generate jUnit file

-p package-name

package name for jUnit file; ANTLR v3.3 and greater

gUnit grammar

The form of gUnit grammar:

/** gUnit doc comment */
gunit grammar-name;
@header{package ...;}
/** rule testsuite comment */
rule-name:
«input-1» OK
«input-2» FAIL
«input-3» returns [...]
«input-4» -> «output-4»...
«input-n» -> «output-n»

/** lexical rule testsuite comment */
lexical-rule-name:
«input-1» OK
«input-2» FAIL
...
«input-n» OK //or FAIL

«rule testsuites»

The syntax for testing tree grammars is identical to testing parser grammars except the keyword 'walks'. The syntax is:

gunit grammar-name-1 walks grammar-name-2;

rule-name-1 walks rule-name-2:
«testsuites»

Rule testsuite elements:

Element

Description

options {...}

optional, only accept TreeAdaptor setting currently

@header{package ...;}

optional, only accept package definition currently

"«input»"

input string

<<«input»>>

multiple-line input string

file

input from external file (input string with no quotes or double angle brackets)

"«output»"

expected output string which includes rule's template return value

<<«output»>>

expected multiple-line output string which includes rule's template return value

(«ast»)

expected abstract syntax tree structure

{«condition»}

condition must be true (for jUnit generation mode only)

[«return»]

expected return value from rule, required keyword 'returns' (for single return value only)

OK

success (no errors emitted from rule)

FAIL

failure (errors emitted from rule)

Maven Integration

gUnit - Maven Integration

Troubleshooting

EOF and gunit

If you get and error that looks like this:

junit.framework.AssertionFailedError: testing rule element expected:<(a 3)> but was:
<Stopped parsing at token index 2: line 122:4 no viable alternative at input '<EOF>'>

It's likely that you're testing a rule in the grammar that can't see an EOF following the rule.

Let's say you wanted to test a rule called element.

element : ID ('[' INT ']')? ;

Your test rule might look like:

element:
    "a" -> OK

The problem is in the optional element at the end of the rule. ANTLR generates a test that says if you see a [, then enter the sub rule. If it doesn't see that, it must see what can follow a reference to rule element. Generally this is not going to be EOF because it's in the middle of a grammar, not the highest goal rule. You need to fix this by adding a rule that looks like this:

elementEntry : element EOF ; // allow gunit to call element and see EOF afterwards

This forces EOF to be in the follow set for rule element. Sometimes you get lucky, but if you have an optional element or loop on the end of a rule and you want to test it with gunit, you will probably need a rule like this. It took me a while to figure this out...so remember this one (smile)