Thursday, January 10, 2013

Unknown Works of Charles Dickens

It was a little over a year ago that, having just finished up Andrew Ng's machine learning course on Coursera, I was inspired to build a little neural network of my own. I designed it to predict the next character of a sequence of input characters, given the previous 20 characters. Then I set it to work training itself on the text of Charles Dickens's A Christmas Carol, A Tale of Two Cities, and David Copperfield.

I was working on this while on vacation in Morocco. It took the little Atom-powered netbook I had brought with me about 36 hours to finish the training. When it was done, I set it to work generating more characters based on its own output—that is, taking the previous 20 characters of its own output, it would generate a next character, and then another given the trailing 19 characters of the original sequence followed by the newly generated character, and so on.

The result was a series of literary gems like this paragraph:

dodod aS mading a9 bRinoiceataid me cor s8 and tie howe, and the made the for, and said of tartat on dadi beat eo fore add, bow berinod, and was mn of thou in what he Sas, in hadd he baid was withing, mavedaisRoe d a. Ce isable gore and spo~ed ,o arind was Paid sai?,o a dase of be you in for had waice a+m, and I in the keed=ratld Whe, and sbate mawer Ag is, and sNow of the so it was it a fiid and xawis.d&o, with a dowther(ave qpine the wabe do ha d sasbed was of the worbere beAd, comce it hisaid the was bead was it of Bere, I and berq, Winder deadored w!thin the fireaceo fored w; I waacaing waid wor to in the be bor, ?

Has my netbook unearthed what Dickens really meant to say? And should Tolstoy be worried?