Blog from June, 2008

Spent hours and hours on airplanes recently bouncing around Europe giving talks. had some time to think about increasing the recognition strength of LL by increasing the prediction mechanism from a DFA to a pushed down machine, either LL or LR. I've scanned my notes and diagrams; see the attachments on this news item. I tried all sorts of things trying to approximate non-regular lookahead languages but didn't really come up with anything great. One thing of note, if you treat rule references like tokens, then creating a DFA from the NFA automatically left factors rule. Might be a nice feature to add to ANTLR.

As Java gets more and more complicated with generics, closures, etc... I keep looking for simplicity. I have the Mantra prototype, but even that subset is kind of complicated. I decided to look at Smalltalk again. Coincidentally, Nik Boyd, who built Bistro years ago contacted me; he's moved to SF Bay area, which means we can chat in person. He had lots of similar ideas to what I'm thinking about: a slightly improved Smalltalk.

Couple of issues with Smalltalk:

  • no flat file format really for humans
  • dev happens within Smalltalk environment, which is nice, but harder to integrate with other systems
  • no access modifiers; hard to say what public interface is
  • no direct field access; can't say "o x" to get o's x field
  • no a[i] array access operator; have to say "a elementAt: i" or whatever
  • no types on variables, methods
  • metaclasses confuse everybody
  • commercial versions added namespaces (packages)
  • I wonder if we need arrays vs lists? Is #(a b c) the right list syntax?
  • Is $a decent char literal syntax?
  • should have trees as default data structure
  • var scoping rules seemed odd to me; will have to look at that again

The Java VM is still the right target but might be fun to start out having an interpreter written in Java.

Hmm...what would it look like?

Animal subclass: Dog
	const int w = 3.
	String name = 'Rex'.
	float weight = 0.0.

	speak [ sys println: 'woof' ]
	feed: int food [ weight += food * w ]
	walk: Point to from: Point here [
		...
	]
	float howMuchFood [ ^weight / w ]
	boolean equals: other [ ^self name == other name ]

'.' is separator not terminator

'=' not ':='
hmm...maybe it's ok to have keywords for var, method, types. Nah. You know it's better to use java-style syntax except maybe methods. I like having smalltalk style methods so youcan essentially add keywords at will. The problem is that people will expect Java semantics then. Hm...

The Default error handling mechanisms that does single token insertion and deletion works really well in many cases. The one area where I think ANTLR need some work is during no viable alternative exceptions and loops around rule references. For example, what happens if you have a fragment like:

d : decl+ ;
decl: 'foo'
    | 'bar'
    ;

If you have input ")) foo bar foo", which is valid except for the crazy "))" at the front, rule d never enters the decl+ because that input is not consistent with the start of a decl. You will get an early exit exception from the loop, which forces d to track the exception and consume until what follows d. In other words, it consumes the entire input. Not very interesting recovery, eh? The problem here is that we cannot force d to enter the loop starting with bad input. It's almost as if we need error recovery at the start of a (...)+ loop (but not(...)*). That would essentially scant characters until it found the start of a decl. Of course, that could go too far as well. We would have to allow people to specify how to consume.

What about when the bad input is after one decl: "foo )) bar foo"? In this case, d enters decl+ loop and then immediately enters decl. It matches, returns, and then d's loop decides whether to continue. Again, we see the crazy "))" input. We cannot continue the loop and we get no exception (because we've matched at least one element for the + loop). d finishes without giving an error (unless we had EOF on the end). Still bad. Maybe what we need is an error recovery mechanism at the end of the loop as well as the beginning. We need something that will allow us to stay in the loop. We also need to allow the user to specify how to recover. The current exception mechanism is not valid on some rules, only on rules themselves after the ';'.

Sometimes you can do a good job with context-sensitive error handling if you can use the rule level granularity:

d : decl (';' decl)* ;
decl: 'foo'
    | 'bar'
    ;
    catch[NoViableAltException nvae] {
      reportError(nvae);
      // instead of usual: recover(input,re); do our own context-
      // sensitive search for re-synchronization
      BitSet startOfDecl = new BitSet(new ArrayList(){{
          add(FOO); add(BAR);
      }});
      consumeUntil(input, startOfDecl);        
    }