Thursday, November 4, 2010

Don't Trust. Do Verify.

Thanks to Tateru Nino for disabusing me of the idea that a Second Life viewer actually uploads CIL code to the SL server. That bit of information certainly invalidates a lot of the musing in my previous post. As Emily Litella would say, never mind.

But my misadventure led me to Tateru's blog, which is not only a mother lode of delicious dirt on SL and Linden Lab (and some other interesting companies), but is also full of recommendations for magnificent productivity killers like Dwarf Fortress. I haven't dared try Dwarf Fortress, but I could just feel the usefulness seeping out of my body reading Tateru's description of it.

To be sure, if I were Linden Lab, I'd probably be pretty hesitant to let someone upload completely unrestricted Mono code. But the whole business of cooperating untrusted code is still one that fascinates me. If you want to create an environment in which unknown people load possibly malicious code of unknown quality into a running system without breaking that system, what are your options?

It turns out that more has been written on the topic than I had realized. Either I'm getting better at choosing search terms, or Google is getting smarter about figuring out what I mean, or more stuff is getting indexed (or all three). I'd never previously managed to find anything relevant, but searching last night for “cooperating untrusted code” turned up these three papers:

I haven't had time to do any more than skim the three papers, but I gather that the first paper's approach is to give each untrusted code its own VM, like Second Life, whereas the other two actually commingle objects created by different programs on a common heap. Intuitively, the latter approach sounds more flexible (and possibly more space-efficient, if some data can have multiple owners), but without having read the papers in any detail, I have trouble visualizing just how much trouble you can get into this way.

It seems to me that in designing a system that accommodates untrusted code, you have to decide early on how and when to restrain code that threatens to crash your server by completely exhausting a critical resource. Do you place a small fixed upper bound on each program's consumption of a resource, as SL does with a script's memory, and hope that the total number of programs is never enough that the resource runs out? Do you allow programs to bid (with some currency or other) on resources, where prices tend towards infinity as a resource runs out? Do you apply some sort of fixed throttle to program actions, as LSL does with its energy and delay costs for each routine that manipulates the virtual world?

Another thing you have to decide is how to treat trusted code imported into your system. For example, assume someone wants to program your system using some well-known library. You firmly believe the library has no issues related to security or inefficiency (yeah, right!), and you're willing to install the library on your system for others to use. But once someone invokes library code, thereby loading it into memory, there's less memory available for all the other programs, even those not using the library. So how do you decide how much common code to allow? Who pays for it?

And finally, Linden Lab has apparently at least dipped its toes into the idea of a more unconstrained scripting interface. Tateru mentions in this post that the idea of other scripting languages has at least been considered (and I gather from her heading that C# is one of those languages), but believes no work is ongoing.

2 comments:

  1. Thankyou for the kind words :)

    ReplyDelete
  2. To the list of papers above, I should add Garbage Collector Memory Accounting in Language-Based Systems by David W. Price, Algis Rudys, and Dan S. Wallach, 2003 (http://www.cs.rice.edu/~dwallach/pub/oakland2003.pdf).

    ReplyDelete