Thursday, November 4, 2010

Don't Trust. Do Verify.

Thanks to Tateru Nino for disabusing me of the idea that a Second Life viewer actually uploads CIL code to the SL server. That bit of information certainly invalidates a lot of the musing in my previous post. As Emily Litella would say, never mind.

But my misadventure led me to Tateru's blog, which is not only a mother lode of delicious dirt on SL and Linden Lab (and some other interesting companies), but is also full of recommendations for magnificent productivity killers like Dwarf Fortress. I haven't dared try Dwarf Fortress, but I could just feel the usefulness seeping out of my body reading Tateru's description of it.

To be sure, if I were Linden Lab, I'd probably be pretty hesitant to let someone upload completely unrestricted Mono code. But the whole business of cooperating untrusted code is still one that fascinates me. If you want to create an environment in which unknown people load possibly malicious code of unknown quality into a running system without breaking that system, what are your options?

It turns out that more has been written on the topic than I had realized. Either I'm getting better at choosing search terms, or Google is getting smarter about figuring out what I mean, or more stuff is getting indexed (or all three). I'd never previously managed to find anything relevant, but searching last night for “cooperating untrusted code” turned up these three papers:

I haven't had time to do any more than skim the three papers, but I gather that the first paper's approach is to give each untrusted code its own VM, like Second Life, whereas the other two actually commingle objects created by different programs on a common heap. Intuitively, the latter approach sounds more flexible (and possibly more space-efficient, if some data can have multiple owners), but without having read the papers in any detail, I have trouble visualizing just how much trouble you can get into this way.

It seems to me that in designing a system that accommodates untrusted code, you have to decide early on how and when to restrain code that threatens to crash your server by completely exhausting a critical resource. Do you place a small fixed upper bound on each program's consumption of a resource, as SL does with a script's memory, and hope that the total number of programs is never enough that the resource runs out? Do you allow programs to bid (with some currency or other) on resources, where prices tend towards infinity as a resource runs out? Do you apply some sort of fixed throttle to program actions, as LSL does with its energy and delay costs for each routine that manipulates the virtual world?

Another thing you have to decide is how to treat trusted code imported into your system. For example, assume someone wants to program your system using some well-known library. You firmly believe the library has no issues related to security or inefficiency (yeah, right!), and you're willing to install the library on your system for others to use. But once someone invokes library code, thereby loading it into memory, there's less memory available for all the other programs, even those not using the library. So how do you decide how much common code to allow? Who pays for it?

And finally, Linden Lab has apparently at least dipped its toes into the idea of a more unconstrained scripting interface. Tateru mentions in this post that the idea of other scripting languages has at least been considered (and I gather from her heading that C# is one of those languages), but believes no work is ongoing.

Tuesday, November 2, 2010

LSL and Its Discontents

Like many software-literate residents of Second Life, I haven't been very impressed with Linden Lab's scripting language, LSL.

How Second Life runs LSL programs (“scripts”) is neat: each script gets its own little 64-kbyte VM, and the state of the script persists—a script is suspended when the object it lives in is removed from the world (by being taken into your inventory), and resumed when the object is put back into the world.

But LSL itself is no gem of programming language design. It's verbose, and the only thing that looks like a data structure is the list. A list can't contain other lists, which pretty much dashes all hopes of using LSL to build any data structure you would recognize from more full-featured programming languages. The paucity of control and data structures makes code repetitious and, in many cases, inefficient.

There are, sort of, ways to impose some large-scale structure on your code, in that scripts can communicate with each other. But there's no built-in protocol that lets scripts view each other as implementing any particular interface, and building such a protocol would eat a good chunk out of your 64k (and the source code for your protocol would need to be replicated in each script).

So what, exactly, would it take to make a better scripting language for Second Life? Knowing that the compiler for LSL lives in the Second Life viewer (rather than on Linden Lab's servers, which might be your first guess), I got intrigued enough by this question the other night to download the source code for the open-source Phoenix viewer and see how LSL is compiled.

As best I can figure out, the viewer compiles scripts to CIL. That makes sense, since I know the Second Life servers run Mono. For most operations that aren't generated inline, LSL calls routines in namespaces called [ScriptTypes]LindenLab.SecondLife, [LslLibrary]LindenLab.SecondLife, or [LslUserScript]LindenLab.SecondLife, but there are also direct calls to [mscorlib]System.Collections.ArrayList. This suggests that the generated code could refer to other things in [mscorlib] (and that the restriction that LSL lists cannot contain other lists is peculiar to LSL, because ArrayList has no such restriction).

For each script, the viewer compiles a single class that extends [LslUserScript]LindenLab.SecondLife.LslUserScript. I'm guessing that that's a fundamental characteristic of the viewer-server protocol for dealing with compiled scripts, so you can't throw in additional supporting classes, not even teeny-weeny ones.

So where do these observations get us? Well, consider three possible strategies for implementing a new scripting language:

  1. Entirely within LSL. Write a script that compiles and interprets some other language, or (more likely, because even a small interpreter will have trouble fitting in 64k) several intercommunicating scripts. The several-scripts approach would require saving the execution stack while awaiting a reply from another script—meaning you have to manage your own stack, getting continuations for free (well, not that expensive, anyway).

  2. Within the viewer. Generate CIL, but assume you have access to all of [mscorlib].

  3. In the viewer and (with the assistance of Linden Lab, should such assistance be forthcoming) the server. Generate CIL, but in addition to [mscorlib] and the LindenLab.SecondLife libraries, install and use any additional classes your language's runtime system requires.

I've seen rumors that implementations of the first kind (entirely within LSL) for Lisp and Forth exist. However, I wasn't able to track down any actual code, and I suspect these are just myths (someone prove me wrong!). It's hard to imagine that a single-script LSL-based approach would leave much memory for user code, or that a multiple-script approach would run very fast.

An implementation of the second kind (entirely within the viewer) is more flexible, but [mscorlib] is not, by itself, ideally suited as a runtime library for a small scripting language. You'd probably wind up needing to generate the contents of your runtime library along with user code, and replicate it in every 64k VM that uses your language, which is not very encouraging. And I'm not sure whether the Second Life Terms of Service would allow you to run a viewer with an altered compiler without the permission of Linden Lab.

The only practical approach might be the third. I haven't talked with Linden Lab about it, or know anyone who has, so I have no idea how open they'd be to the suggestion of a new scripting language not developed in-house, or to installing the language's libraries on their servers. I do imagine they'd want any such libraries to be compact, which probably rules out any of the larger existing programming languages out there.

One thing I don't think you'd need to do is change any of the existing LSL routines for interacting with the Second Life virtual world itself. These routines (whose names all begin with ll) appear in the generated CIL to have perfectly ordinary interfaces that should work for any language compiled for Mono (although some of the routines might have had friendlier calling conventions if more data types were available).

What would a new scripting language actually look like? The first thing that comes to mind is something Lisp-like; a common representation for code and data sidesteps the need to design serialization conventions, and makes it easy to ship little bits of code around from one place to another. Don't like the way some other Second Life object is behaving? Just send it a new behavior!

I have some vague ideas for the design of a small, statically-typed Lisp that (aside from its static typing) looks a little like Clojure. But that's a project I'd rather not get sucked into right now. And almost any scripting language with a little bit of thought given to its design would be an improvement over LSL.