Terence Notes

New blog entries

For all the entries, see Terence's blog RSS

Matching parse tree patterns, paths

[See tree-patterns branch in parrt/antlr4] ANTLR 4 introduced a visitor and listener mechanism that lets you implement DOM visiting or SAX-analogous event processing of tree nodes. This works great. For example, if all you care about our looking at Java method declarations, grab the Java.g4 file and then override methodDeclaration in JavaBaseListener. From there, a ParseTreeWalker can trigger calls to your overridden method for you as it walks the tree. Easy things are easy.…

The real story on null vs empty

In null vs missing vs empty vs nonexistent in ST v4 a few years ago, I tried to resolve in my head the difference between a missing attribute, a null value, an array with no elements, and a string with no characters. I don't think I got it completely thought through and ST v4 might have some weird inconsistencies. This page is an attempt to finally write down all the cases and resolve exactly how things should work.…

Tree rewriting in ANTLR v4

Because most ANTLR users don't build compilers, I decided to focus on the other applications for ANTLR v4: parsing and extracting information and then translations. For compilers, we need to convert everything into operations and operands--that means ASTs are easier. For example, 3+4 should become a tree with + at the root and 3 and 4 as operands/children: (+ 3 4). The parse tree in contrast is probably (expr 3 + 4) where rule reference expr is the root node and the other elements are children.…

ANTLR and Shroedinger's Tokens

Had a nice lunch with Mihai Surdeanu today at Stanford. Mihai does natural language processing research and has used ANTLR in the past to process English text. He asked for 2 things: tokens that can be in more than one token class (token type) at the same time and the ability to get all interpretations of ambiguous input. Sam Harwell is also interested in getting all interpretations.…

Hello github

The ANTLR project is moving to github within a few days. Thanks to user anatol https://github.com/anatol for setting up the ANTLR organization https://github.com/organizations/antlr and pulling in the perforce (p4) repositories that we've been using. Everything is now set up for us to seamlessly start using git/github. The purpose of this blog post is to announce this move and to outline how I think workflow should go. Repositories antlr-repos.…

Flaw in ANTLR v3 LL(*) analysis algorithm

Update Oct 2012: resolved with correct definition of when to terminate prediction lookahead. Turns out we don't need to push this extra stack element. Over the Christmas holidays, I've been busy building example grammars for ANTLR v4. The thing I noticed immediately is that grammars just work. There are no error messages from ANTLR when generating code and all we can get are true ambiguity errors at runtime. E.g., if you can recognize T(i) as both a function call and a constructor call.…

ANTLRWorks2 design thoughts

ramblings about design as stream of consciousness I built a quick mockup with NetBeans just so that I have all of the Windows in front of me with a couple of faked images. The basic design is easy because NetBeans allows us to move windows around as we want. I happen to lay the Windows out like this: AW2-mockup.png (Sam already has the navigator and the editor Windows filled in, but I was too lazy to incorporate.) Every window has content, publishes events, and listens for events.…

Report of GUI's death greatly exaggerated

I just finished attending a three day workshop on developing standalone GUI applications with this awesome Java applications framework that you've never heard of. Actually, that's not true. You've heard of it but thought it was an IDE--NetBeans. Unfortunately, the amazing applications framework has been hitched to the NetBeans IDE wagon which, for better or worse, has much less market share than eclipse. (I know how NetBeans users feel because I use Intellij,…

Sample v4 generated visitor

I have a prototype working for the automatic parse tree construction and automatic visitor generation. Imagine we have the following simple grammar: grammar T; s : i=ifstat ; ifstat : 'if' '(' INT ')' ID '=' ID ';' ; The usual startup code looks like: TLexer t = new TLexer(new ANTLRFileStream(args[0])); CommonTokenStream tokens = new CommonTokenStream(t); TParser p = new TParser(tokens); p.s(); // invoke the start rule, s To make it create a parse tree,…

Auto tree construction and visitors

Ok, been doing some thinking and playing around and also talking to Sam Harwell / Oliver Zeigermann. The first modification I've made is to turn parse tree construction on or off with a simple Boolean, rather than having to regenerate the parser with -debug. Also, the parsers fire methods enterMethod/exitMethod with the rule index all the time now since it is so convenient to have these. No more needing -trace and regenerating to get debug output.…

ANTLRWorks 2 planning, features

Summarizing discussion from people on the interest list. Editor Likes the editor works pretty well to help with auto indenting etc to make things look pretty and provide easy to read formatting. Dislikes editor is quirky forward and backward arrows don't always work undo is character by character a number of people pointed out the inefficient and sluggish error checking and syntax highlighting. there are little user benefits for key-stroke-by-keystroke checking while the user is typing,…

Squirrel away the trees, call on the visitors

After a few weeks away from ANTLR v4 coding, I'm back to thinking about tree grammars and the automated generation of tree visitors. I recently replaced a number of tree grammars in ANTLR v4 itself with much simpler visitor implementations. Doesn't require a separate specification and is much easier to debug. I made an ubervisitor that actually matches patterns in the tree rather than nodes (using a single prototype tree grammar) and then calls listener functions.…

Error productions proposal for ANTLR v4

Introduction I'm abandoning this post mid-stream...seems that regular alternatives can match erroneous input just as easily as so-called error alternatives. Because of adaptive LL(*), it shouldn't affect production speed at all once it gets warmed up. ANTLR has a built-in mechanism to detect, report, and recover from syntax errors. It seems to do a pretty good job. Certainly it's better than PEG, which can't detect errors until EOF.…

These are some of my favorite things--PDA, RTN, ATN, DFA, NFA

At long last, I'm back on the ANTLR v4 rebuild after 9 months hiatus to write an academic LL(*) paper with Kathleen Fisher http://www.antlr.org/papers/LL-star-PLDI11.pdf and release StringTemplate v4 http://www.StringTemplate.org. Woot! Ok, so what does all that title nonsense have to do with ANTLR v4? Well, v4 will use all those things at some point, either in analysis or in the generated code. I'm proposing something a little different for v4: Along with a recursive-descent parser,…

Updated Sample Context-sensitive Lexer in ANTLR v3

After reading more about whitespace handling in scannerless parsing generators (e.g., GLR, PEG), it looks like you have to manually insert references to whitespace rules after every "token rule" and one at the beginning of the parse. So apparently, ANTLR is a scannerless parser generator if you simply use characters as tokens. This page shows not only how to build a real scannerless parser in antlr but also shows how to build abstract syntax trees (i.e., not parse trees)!…

3 Comments

Links to Terence Notes on Antlr3.

These are the old non-wiki-based entries.

*lookahead, analysis
*lexers, parser integration
*tree grammars, parsing
*code generation
*semantic predicate hoisting
*error reporting, recovery
*ASTs, parse trees, transformation
*Aspects, Actions, Rewriting, Attributes