Thinking about programming language productivity

Raw notes for thinking about programming language productivity...will update as i get more thoughts and time to flesh out.

  • interoperability via REST / sockets
  • lambdas to reuse strategies over structures
  • streams from data structures
  • pure funcs not methods needed
  • packages to hide name space
  • stack variables shared by multiple funcs like antlr's scopes?
  • register comparators and such for diff types so sort, find, etc... work on new data structures?
  • built in templates, sets, arrays, lists, trees, maps
  • force manual types only when we can't determine?
  • force manual types only on return values / parameters not locals?
  • why do we use locals? isolation of temp data not useful to others; we use fields a LOT for that. can we distinguish between object properties and temp data?
  • what about permissions on objects instead of blunt immutable?
  • interesting, smalltalk's code blocks seem equiv to higher order functions. can send and receive blocks from methods.
  • func args. with lots of args, names are useful ala smalltalk:
    list map: [...] from: here to: there.
    
    But, with 1 or 2 args, no names better: f(x) or setSalary(name, value)
  • inheritance a measure of object similarity. this effectively limits the kinds of objects a function works on; e.g., f(Vehicle v) works on any "kind of" vehicle whereas f(Truck t) works only on trucks, not any kind of vehicle or on users or floats. ML has "let-polymorphism" where you can specify the kinds of types that a function applies to.
    let val Id = fn x => x in (Id 1, Id true) end
    
    Strongtalk has T1|T2 specs

narrowing it down

  • mixins == good; how do they relate to interfaces? may be an abstract mixin is an interface?
  • static but not always explicit types; no casts; 
    unsafe: List<Object> a = List<String>? compile error. Is List<Number> castable to List<Integer>? covariance. watch out for ((Object[])(new String[1]))[0] = 3.
    I like idea of covariant method args for overriding. http://www.artima.com/scalazine/articles/scalas_type_system.html talks about Cow overriding animal eat(Food) but wanting to restrict to eat(Grass); can't in java. might need polymorphic arg dispatch. abstract types seem useful. I think Java allows us to override the return type now.
    class T {
            T foo() { return null;}
    }
    class U extends T {
            U foo() { return null;}
    }
    
  • if generics, "reify" them (keep types around at runtime) unlike java.
  • allow implicit downcasting but warn about all other issues; check downcast at runtime. It would leave a hole in the type system but Ruby/Python/Smalltalk programmers get along well completely without types so perhaps there's a middle ground here.
  • pattern matching as lib?
  • keep smalltalk like (typed) syntax?
    at: int i put: Object value => void { ... }
    size => int { ... }
    
    I use => return type so type names make sense like ML and descendants: (int, Object => void)
    Neil Gafter's lambda notation is similar for java:
    ()->int[] arrayLambda = ()->new int[]{1, 2, 3};

Boy, all this sures sounds like Stongtalk. almost.

Schrödinger types

Let's say you're building an interpreter and have an Instr super class and Jump, Add, etc... subclasses.  A List<Instr> lets you make a list of any kind of instruction.  For efficiency you might want a switch on opcode

void exec(List<Instr> code) {
I = code[ip++];
switch ( I.opcode ) {
  case Jump :
      ip = ((Jump)I).targetAddr; // need cast
      break;
  case Add :
      ...
}

Of course I could alter each Instr class with an exec() method but let's say I didn't want to alter source to define exec().  When I'm at a particular case, I know for sure as the programmer what the type is.  To notify the compiler's type system I have to have a cast.  How can we get rid of this?  We can't allow arbitrary downcasting but perhaps we could define a type that was the set of valid instructions:

type Instrs = Add | Jump | ... ;

Then we could have List<Instrs> and then any element such as code[i] could be treated like any of the instructions.  It's somewhere between strict static typing and no static typing (dynamic typing).  Not ask flexible as pure subclassing relationship because we have to update type Instrs when we add new instructions subclasses.

Hmm...useful in other ways but perhaps here we should simply allow downcasting. Java inserts a checkast instruction anyway when you say:

public boolean equals(Object o) {
	T t = (T)o;
...
}

You get

public boolean equals(java.lang.Object);
  Code:
   0:	aload_1
   1:	checkcast	#2; //class T
   4:	astore_2
...

That might make refactoring less useful however.  We couldn't guarantee that we'd replaced all methods names for example.  As we refactored, the code could become more and more improperly typed, leading to lots of runtime errors.

Scala has compound types, which are they have additive types. So, you can say an arg to a function is both cloneable and serializble.

So combining AND and OR, I could define types like

type T = (ST | DebugST) & Cloneable;

What's the hook?

Why would anyone use?

State and side effects

In order to reuse some functionality, we have to be able to call the function or functions without side effects; well, at least without those functions altering something we don't want them to. For side effects, the less likely you will be able to reuse the functionality. I've been thinking about this as I build a compiler for StringTemplate v4. what I really want to say something like this:

Compiler c = new Compiler(...);
String template = ...;
CompiledST code = c.compile(template);

And, I don't want it to be messing with a global symbol table or anything like that. I just wanted to come back with the compiled template with all the bytecodes the necessary information to execute. One of the reasons for this is that there are nested templates and I want to launch a subcompilation on a subtemplate. I can't call c.compile(subtemplate) because all of the temporary data necessary to compile the template since in the Compiler object. So, do I create a new compiler for each template? seems weird. I think I should have one compiler, but with the ability to recursively compile things.

Ok, no problem. separating out the temporary data into some state:

state Compilation {
    byte[] bytecodes;
    StringTable strings;
    int ip;

    CompiledST compile(String source) {
        ...updates bytecodes, strings, ...
    }
    void emit(int opcode, ...) {...}
    void writeShort(int addr, short value) {
        bytecodes[addr] = ...;
    }
}

But now, this makes no sense from an object-oriented point of view, even if we are using objects to express it. do I really want to ask some temporary data to do some compilation? that does not seem like very good abstraction. we are co-opting objects to represent temporary state.

Another problem is that emit, writeShort, and so on are really just functions. They might be scoped within the compiler object but they are really just functions. They are not modifying the state of the compiler object because we've moved the temporary data outside of the Compiler.

What we really need to do is pass the state object around the functions:

Compiler c = new Compiler(...);
String template = ...;
CompilationState state = new CompilationState();
CompiledST code = Compiler.compile(state, template);

where compile() is now a static method of Compiler:

    static CompiledST compile(CompilationState state, String source) {
    static void emit(CompilationState state, int opcode, ...) {...}
    static void writeShort(CompilationState state, int addr, short value) {
        state.bytecodes[addr] = ...;
    }

Of course this is pretty ugly and inconvenient; it's like having to pass around the self for this pointer to simulated methods in a non-object-oriented language.

We could create a Compilation object and then call comp.emit(opcode) but then we're asking a temp data object to answer messages. What we want is emit(opcode) where the comp is implied like a this. Maybe an include that works on an instance not a class:

use c = new Compilation(...); // imports funcs with implied Compilation arg not "this"
emit(opcode);

'course we also need c to be a stack for recursive compilation of nested funcs or whatever. I'm simulating now with

public void emit1(CommonTree opAST, short opcode, int arg) {
	$template::state.emit1(opAST, opcode, arg);
}

So, instead of $template::state.emit1(...), I called just emit1(...). extra work on my part to set up, but use is less offensive.

Scala

Scala is ML+Java sort of. Integrated very nicely. What I like:

  • Since operators are method calls and there are implicit type conversions, we can create libraries that support natural client code; e.g., BigInt vs int.
  • Since functions can be args of other functions, we can easily build new control constructs.
  • Traits support mixins to extend existing types w/o having to alter them or even have source code.
  • statically typed; absolutely crucial for refactoring.
  • explicit (static) types on func args; optionally on return values etc...
  • type inference supports ease of dynamically typed lang but maintaining static typing.
  • lambdas/code blocks; e.g., scala (x:Int)=>x+1 is func taking int return x+1
  • more immutable types like Java's String
  • pattern matching; i'd probably do a grammar | pipe thing not match case*. It's like parsing data structures. Pierre-Etienne's Tom tool is probably similar to scala.
  • list pattern matching like x::remainder is cool; can we do for strings too?
  • Patterns in var defs: val (number, string) = someTuple
  • tuples, useful for multiple return values
  • No "new" keyword: List(List(3),2)
  • Option seems better than using null/non-null, but does it make using results a hassle?
  • Huge core library that seems useful
  • Parameterized types

Negatives? Maybe:

  • I'm not sure I can deal with the complexity of the lang. recursion blows average developer's mind; how will they process currying?
  • case classes are convenient but can we say it another way?
  • val v = 3 is like final int v = 3. Not sure I've ever used final. Immutability yes though.
  • hard to adjust to a(3) not a[3] but makes sense since we can turn any obj into array by defining apply(i) method.
  • optional parens. Bill Venners' test stuff lets you say result should not be null, which is result.should(not).be(null). Not sure I'm for this ruby style of anything goes.

Mainly I'm trying to figure out why a statically typed Smalltalk couldn't do this with much simpler syntax. Obviously, Scala is much more about immutable elements and has the mailbox/actor thread model etc...  Even the cool pattern matching can be done in smalltalk yielding reasonable syntax. Well, it couldn't do a lot of it w/o extra syntax. Smalltalk could do mixins via metaclass protocol (which I don't like).

Cause of nonreuse:

  • modal, side-effects
    • can't pass back more than one value; forces fields
  • can't pass in customization code easily; no lambdas, func ptrs
    • constant cut/paste of foreach across structure iteration
    • clumsy to separate actions from traversals. need strategy spec or alg spec applied to structure.
    • need common concept of serializing structure so we can apply code, carrying along our temp data we can modify then can copy data out and assign to locals (automatically?)
  • type incompatibility or data format incompatibility
    • types are good but we need a way to normalize similar things like C++ function style casts "anotherType(someType)"
    • sockets between processes on shell normalize but lose type info but if we know how to reparse data, it's cool. so type converter type thing might work. hmm...but that invokes code secretly
    • json and friends: twitter feed is easy with html/css