Home

Matching parse tree patterns, paths

[See tree-patterns branch in parrt/antlr4] ANTLR 4 introduced a visitor and listener mechanism that lets you implement DOM visiting or SAX-analogous event processing of tree nodes. This works great. For example, if all you care about our looking at Java method declarations, grab the Java.g4 file and then override methodDeclaration in JavaBaseListener. From there, a ParseTreeWalker can trigger calls to your overridden method for you as it walks the tree. Easy things are easy.…

The real story on null vs empty

In null vs missing vs empty vs nonexistent in ST v4 a few years ago, I tried to resolve in my head the difference between a missing attribute, a null value, an array with no elements, and a string with no characters. I don't think I got it completely thought through and ST v4 might have some weird inconsistencies. This page is an attempt to finally write down all the cases and resolve exactly how things should work.…

Tree rewriting in ANTLR v4

Because most ANTLR users don't build compilers, I decided to focus on the other applications for ANTLR v4: parsing and extracting information and then translations. For compilers, we need to convert everything into operations and operands--that means ASTs are easier. For example, 3+4 should become a tree with + at the root and 3 and 4 as operands/children: (+ 3 4). The parse tree in contrast is probably (expr 3 + 4) where rule reference expr is the root node and the other elements are children.…

ANTLR and Shroedinger's Tokens

Had a nice lunch with Mihai Surdeanu today at Stanford. Mihai does natural language processing research and has used ANTLR in the past to process English text. He asked for 2 things: tokens that can be in more than one token class (token type) at the same time and the ability to get all interpretations of ambiguous input. Sam Harwell is also interested in getting all interpretations.…

Hello github

The ANTLR project is moving to github within a few days. Thanks to user anatol https://github.com/anatol for setting up the ANTLR organization https://github.com/organizations/antlr and pulling in the perforce (p4) repositories that we've been using. Everything is now set up for us to seamlessly start using git/github. The purpose of this blog post is to announce this move and to outline how I think workflow should go. Repositories antlr-repos.…

Flaw in ANTLR v3 LL(*) analysis algorithm

Update Oct 2012: resolved with correct definition of when to terminate prediction lookahead. Turns out we don't need to push this extra stack element. Over the Christmas holidays, I've been busy building example grammars for ANTLR v4. The thing I noticed immediately is that grammars just work. There are no error messages from ANTLR when generating code and all we can get are true ambiguity errors at runtime. E.g., if you can recognize T(i) as both a function call and a constructor call.…

ANTLRWorks2 design thoughts

ramblings about design as stream of consciousness I built a quick mockup with NetBeans just so that I have all of the Windows in front of me with a couple of faked images. The basic design is easy because NetBeans allows us to move windows around as we want. I happen to lay the Windows out like this: AW2-mockup.png (Sam already has the navigator and the editor Windows filled in, but I was too lazy to incorporate.) Every window has content, publishes events, and listens for events.…

Report of GUI's death greatly exaggerated

I just finished attending a three day workshop on developing standalone GUI applications with this awesome Java applications framework that you've never heard of. Actually, that's not true. You've heard of it but thought it was an IDE--NetBeans. Unfortunately, the amazing applications framework has been hitched to the NetBeans IDE wagon which, for better or worse, has much less market share than eclipse. (I know how NetBeans users feel because I use Intellij,…

Sample v4 generated visitor

I have a prototype working for the automatic parse tree construction and automatic visitor generation. Imagine we have the following simple grammar: grammar T; s : i=ifstat ; ifstat : 'if' '(' INT ')' ID '=' ID ';' ; The usual startup code looks like: TLexer t = new TLexer(new ANTLRFileStream(args[0])); CommonTokenStream tokens = new CommonTokenStream(t); TParser p = new TParser(tokens); p.s(); // invoke the start rule, s To make it create a parse tree,…

Auto tree construction and visitors

Ok, been doing some thinking and playing around and also talking to Sam Harwell / Oliver Zeigermann. The first modification I've made is to turn parse tree construction on or off with a simple Boolean, rather than having to regenerate the parser with -debug. Also, the parsers fire methods enterMethod/exitMethod with the rule index all the time now since it is so convenient to have these. No more needing -trace and regenerating to get debug output.…

ANTLRWorks 2 planning, features

Summarizing discussion from people on the interest list. Editor Likes the editor works pretty well to help with auto indenting etc to make things look pretty and provide easy to read formatting. Dislikes editor is quirky forward and backward arrows don't always work undo is character by character a number of people pointed out the inefficient and sluggish error checking and syntax highlighting. there are little user benefits for key-stroke-by-keystroke checking while the user is typing,…

Squirrel away the trees, call on the visitors

After a few weeks away from ANTLR v4 coding, I'm back to thinking about tree grammars and the automated generation of tree visitors. I recently replaced a number of tree grammars in ANTLR v4 itself with much simpler visitor implementations. Doesn't require a separate specification and is much easier to debug. I made an ubervisitor that actually matches patterns in the tree rather than nodes (using a single prototype tree grammar) and then calls listener functions.…

Error productions proposal for ANTLR v4

Introduction I'm abandoning this post mid-stream...seems that regular alternatives can match erroneous input just as easily as so-called error alternatives. Because of adaptive LL(*), it shouldn't affect production speed at all once it gets warmed up. ANTLR has a built-in mechanism to detect, report, and recover from syntax errors. It seems to do a pretty good job. Certainly it's better than PEG, which can't detect errors until EOF.…

These are some of my favorite things--PDA, RTN, ATN, DFA, NFA

At long last, I'm back on the ANTLR v4 rebuild after 9 months hiatus to write an academic LL(*) paper with Kathleen Fisher http://www.antlr.org/papers/LL-star-PLDI11.pdf and release StringTemplate v4 http://www.StringTemplate.org. Woot! Ok, so what does all that title nonsense have to do with ANTLR v4? Well, v4 will use all those things at some point, either in analysis or in the generated code. I'm proposing something a little different for v4: Along with a recursive-descent parser,…

Updated Sample Context-sensitive Lexer in ANTLR v3

After reading more about whitespace handling in scannerless parsing generators (e.g., GLR, PEG), it looks like you have to manually insert references to whitespace rules after every "token rule" and one at the beginning of the parse. So apparently, ANTLR is a scannerless parser generator if you simply use characters as tokens. This page shows not only how to build a real scannerless parser in antlr but also shows how to build abstract syntax trees (i.e., not parse trees)!…

3 Comments

Sample Context-sensitive Lexer in ANTLR v3

Scannerless parsing generators have an advantage over separate lexers and parsers: it's much easier to create Island grammars, combine components of grammars, and deal with context-sensitive lexical constructs. I still think I prefer tokenizing the input, but thought I would run an experiment to see what a scannerless ANTLR grammar would look like. I started out with the grammar that contained an LL(*) but non-LL(k) rule (stat). Because we're looking at characters as tokens,…

3 Comments

ST v4 speed test with ANTLR

I just tested the new version of ANTLR that uses ST v4 not v3. In terms of code generation, it's just about twice as fast given plenty of memory (750M). To process Jim Idle's 15296 line TSQL grammar, it takes 1760ms instead of 3624ms, though that doesn't alter the overall wall clock performance much. It still takes about 12 seconds to process the grammar. It generates a whopping 168,677 lines of Java code (not including the lexer). That gives us about 95,…

Thinking about programming language productivity

Raw notes for thinking about programming language productivity...will update as i get more thoughts and time to flesh out. interoperability via REST / sockets lambdas to reuse strategies over structures streams from data structures pure funcs not methods needed packages to hide name space stack variables shared by multiple funcs like antlr's scopes? register comparators and such for diff types so sort, find, etc... work on new data structures? built in templates, sets, arrays, lists, trees,…

ANTLR v4 lexers

I've moved this content to the v4 ANTLR pages.

On second thought, null and missing are same thing

I've done a lot of thinking about this and I believe I made a mistake in the final ST 3.2.1 version before my current rebuild (v4). It's too confusing, and makes the code too complex, to distinguish between missing and present but null. There is huge history with ST too suggests that it seems to work okay treating a missing attribute and a null attribute as the same thing (i.e., not there). We have the null option that lets us say what to replace null with.…