Fun with reservoir computing
By now you’ve probably read Andrej Karpathy’s blog post The Unreasonable Effectiveness of Recurrent Neural Networks, and if you haven’t, you definitely should. Andrej’s RNN examples are the inspiration for many of the RNNs doing* silly things that get picked up by the lay press. The impressive part is how these artificial texts land squarely in the uncanny valley solely by predicting the next character one by one. The results almost, but not quite, read like perfectly reasonable nonsense written by a human.
In fact we can approach similar tasks with a healthy injection of chaos. Reservoir computing has many variants, but the basic premise is always the same. Data is fed into a complex chaotic system called a reservoir, among other things, giving rise to long-lived dynamic states. By feeding the states of the system into a simple linear regression model, it’s possible to accomplish reasonably complicated tasks without training the dynamic reservoir at all. It’s like analyzing wingbeat patterns of Australian butterflies by observing hurricanes in Florida.
A computing reservoir can be made out of a wide variety of chaotic systems (e.g. a jointed pendulum) amenable to taking some sort of input but the ones most akin to neural networks consist of a big network of sparsely connected nodes with random weights and a non-linear activation function. In this example I’ll use a Schmitt trigger relaxation oscillator. If you have trained a few neural networks yourself, you can think of this as simply a tanh activation function with hysteresis. If you build/built BEAM robots, you can think of the activation function as one neuron in a 74HC14 based microcore you use to control walking robots.
By self-connecting a large vector of nodes to itself and to an input, it’s possible to get a complex response from very simple input. The response to a step function in this example is long lived, but it does die out eventually.
The activity in the reservoir looks a little like wave action in a liquid, as there tends to be a similar amount of energy in the system at any given time. With very little damping this can go on for a long time. The gif below demonstrates long-lived oscillations in a reservoir after a single unit impulse input. After a few thousand iterations of training, what we get is starting to look a little less like gibberish. What’s particularly interesting is that
But what about the funny text? Can reservoir computing make us laugh? Let’s start by testing whether reservoir computing can match the writing proficiency of famous fictional writer Jack Torrance. The animations below demonstrate the learning process: the dynamic reservoir carries a chaotic memory of past input characters, and a linear classifier predicts each next character as a probability. At first the combined system ouptputs nonsense, and we can see that the character predictions are very dynamic and random. Then the system gets very confident, but very stupid.
Later the system begins to learn words and the character probabilities are adapting to previous characters in a sensible matter-all without back-propagating into the reservoir at all.
After a while the system learns to reliably produce the phrase “All work and no play makes Jack a dull boy.”
If you want to try it yourself, this gist depends only on numpy. I made a normal GH repository for my code for generating figures and text here. There may be a Tensorflow version for training faster/on more complicated texts in the works (running it with numpy is no way to write a thesis).
*The torch-rnn repository used by Elle O’Brien for romance novel titles was written by Justin Johnson.