Saturday, July 2, 2011

Pushing Back

Streams are Scala's flagship lazy data structure, and they let you do all the good things that lazy evaluation enables. But although laziness can have performance benefits (by not evaluating something that will never be used, or delaying a computation that would entail deep recursion if performed eagerly), it can also incur a performance cost by unnecessarily wrapping an already-evaluated value in a thunk.

If every item in a sequence is already evaluated, you'll want to use a List, not a Stream. But sometimes you have a sequence that is a mix of evaluated and unevaluated items, often because you look ahead in a stream and evaluate and then push back items that are not processed immediately. This can occur in compiling programming languages, or in normalizing the order of a prettyprinting token stream, as in S. Doaitse Swierstra and Olaf Chitil's 2009 paper Linear, Bounded, Functional Pretty-Printing.

It occurred to me that you can write a Scala-style stream that avoids the thunk-wrapping and unwrapping that occurs when you push an item onto the head of an already-evaluated stream (in Scala, only the tail of the stream is ever passed by name; the head of a stream is passed by value when the stream is created, and so is always preevaluated). Every function that constructs a stream knows whether its tail is passed by name, and could theoretically create different types of cons cell for evaluated and unevaluated streams:

abstract class Cons [+A] extends Stream[A] {
...
}

class LazyCons [+A] (val head: A, after: => Stream[A])
extends Cons[A] {
lazy val tail = after
}

case class EagerCons [+A] (head: A, tail: Stream[A])
extends Cons[A]

In SI-4698 I submitted a timing test that lets you observe the cost of LazyCons vs. EagerCons in the case where the tail has already been evaluated. On my machine it takes over 50% longer to construct or examine a lazy cell than an eager cell.

I don't know whether Scala streams users add items to the head of already-evaluated streams often enough to make it worth complicating the standard streams implementation with a change like this just for the speedup. But it's an idea.

No comments:

Post a Comment