Saturday, January 10, 2015

The Fix Is Infix

There's nothing like implementing something from scratch to make you appreciate it. In this case, I've been working on a toy logic programming language, ostensibly to investigate opportunities for parallelism in the implementation, but I haven't actually gotten around to the parallelism part—both because the talk I was supposed to give on the topic has been postponed, and because I've gotten distracted by the sheer fun of logic programming itself.

Until I started this project, I never gave much thought to the syntax of Prolog. The format of a Prolog predicate looks very much like an expression in many programming languages, that is, it looks either like a constant identifier, as in this simple rule:

    toast_is_delicious.

or like a C-style function, with a function name followed by parenthesized arguments, as in:

    grandmother(A,B) :- grandparent(A,B), female(A).

which is a rule consisting of three predicates: the head of the rule is grandmother(A,B), and the body of the rule consists of the conjunction of the two predicates grandparent(A,B) and female(A). (You can read the rule as “A is a grandmother of B if A is a grandparent of B and A is female”.)

One common problem with the C-style function syntax is that it is not always obvious which parameter is which. In the case of a two-argument predicate named with a noun, like grandmother above, the usual understanding is that the first argument will be the grandmother, and the second the grandchild; that is, the pattern:

    noun(arg1,arg2)

is to be read:

    arg1 is the noun of arg2.

Here it helps that there is a pretty obvious English translation of the Prolog predicate, and the order in which arg1 and arg2 appear in that translation is the same as in the Prolog. (You could also read the predicate as “arg1 is arg2's noun”, but that doesn't affect the order in which the arguments appear.)

However, not all relationships we want to talk about are as simple as the grandmother-grandchild relationship. Consider:

    gives(A,B,C).

Does this mean that A gives B C, or A gives B to C? In the former case, B is the recipient; in the latter, the gift. English is not helpful as a guide here, because both sentences are idiomatic English. And in general, the more arguments a predicate has, the more ways there will be translate it into natural language.

Of course, the problem of which argument is which is not limited to Prolog; most of us with some programming experience will have had to look up the parameter order to at least some of the functions we've used in other programming languages. Static typing can help with this problem, but only if different parameters have incompatible types.

Anyway, I started out basing my toy language on the Logic Programming section of Abelson and Sussman's venerable Structure and Interpretation of Computer Programs. Because SICP uses Scheme, its logic programming clauses look like Lisp rather than C:

    (rule (toast-is-delicious))
    
    (rule (grandmother ?a ?b)
          (and (grandparent ?a ?b)
               (female ?a)))

Traditional Prolog makes a distinction between predicates and lists, where lists are a built-in data structure; in the syntax, lists are surrounded by square brackets instead of parentheses. Unlike Prolog, the SICP formulation of logic programming does not make a distinction between predicates and lists; the same unification algorithm works for both. The result is a language that feels altogether freer; for example, you can create a database like:

    (rule (alice chases rabbits))
    (rule (alice has a restaurant where you can get anything you want))
    (rule (alice sends a message to bob encrypted with bobs public key))

and ask all about Alice with the query:

    (alice . ?what)

In my own toy language, I adopt a SICP-like conflation of predicates and lists. Also, I add a parser that accepts a slightly more traditional Prolog-like syntax that omits the outer parentheses on each predicate. So you might say:

    grandmother A B: grandparent A B, female A.

But why on earth would you put it that way? If you have any familiarity with English, doesn't that strike you as awfully stilted? How about:

    A is grandmother of B: A is grandparent of B, A is female.

Granted, this is still not natural language, but for me, at least, it's easier to read. And you no longer have to say this:

    gives A B C.

because you can say this:

    A gives B to C.

English, like many other languages, uses words in infix position to describe relationships between other words in a sentence; and in a context where you can arrange words in a way that exploits that natural ordering, you might as well do so.

Writing database rules in such a free, naturalistic style is not without peril. Natural languages are full of words that can be used in different ways. You might not expect a query like:

    X flies like|Y.

(where the vertical bar is the infix cons operator) to yield both:

    X = time
    Y = (an arrow)

and:

    X = fruit
    Y = (a banana)

(although then again, you might). So although at first the syntax I've presented here may make it look as if you can say anything you want, if you actually try it out, you'll realize that you have to adopt conventions as rigorous (and perhaps almost as unnatural) as in any ordinary Prolog program.

Does the sort of free-form infix notation I've presented here have any value in other sorts of programming languages than logic programming languages? One big sticking point I can see is that most programming languages offer no typographic distinction between constant and variable names—so would make a sound be a command to trigger your computer speakers, or to ensure the soundness of some assertion bound to the variable a? Although some programming languages do use sigils to distinguish different sorts of identifiers, the whole idea of nonconstant variables borders on oxymoronic in modern functional programming languages (Scala's use of an uppercase/lowercase in pattern matching notwithstanding). In a unification-based language like Prolog, however, the constant/variable distinction is not just natural, but essential. So I suspect there's nothing here that transfers easily to other contexts.

If you think the syntax I've presented is either too far from real logic programming or some sort of advance in the state of the art, I should point out that there's a perfectly straightforward transformation to genuine, unadulterated Prolog. For example, you can turn:

    (H|_) contains H.
    (_|T) contains X: T contains X.

into:

    cons(cons(H,_),cons(contains,cons(H,nil))).
    cons(cons(_,T),cons(contains,cons(X,nil))) :-
      cons(T,cons(contains,cons(X,nil))).

But are you sure you want to?