I replaced my blocking queue sitting between pipeline actors (threaded consumer/producers) and speed of my word freq program dropped from 5.2s to 3.5s on 5M file simply by replacing the blocking queue with my new ping-pong buffered version. That is close to the 3.0s achieved by the nonthreaded version.
// threaded, pipeline version; 3.5s (from 5.2s) on 5M file f => Words() => { string w | ... }
// nonthreaded, nested map operation on big list; 3.0s f.lines() => a; a:{ string line | line.split():{ string w | ... }};
Buffer size is 1000. When I drop to 100, slows down by .2s. When I increase to 4000, no change. 40000, slows down by .5s. Size 400 seems to be slightly faster. Default queue size set to 400 now.