Three DBM experiments

Well that was a bit tedious. OK maybe more than a little bit. But now we can get back to tormenting Cid. I have in mind three experiments.

In order to modify Gregoire’s DBM code for these experiments, we only need to make some very small changes. Here is what we need to change:

1. The data we’re training over is different. We’ll have to create a new data array X to learn over. What this will look like depends on our choice of External Universe.
2. The size of the network (his brain) is much smaller.
3. The visualizations have to change because of 1. and 2.

Other than that everything is the same!

Experiment #1. Our Original Grumpy Universe

Here’s the setup for this one. Imagine we have a brand new creature we’ve Intelligently Designed. That’s Cid. He has a brain capable of building a model of his Universe. That’s the DBM with 1 visible unit, 4 hidden units in the first hidden layer, and 4 hidden units in the second hidden layer. His visible unit is the interface between his internal model and the External Universe (EU). In this first experiment, we design the EU so that when Cid is observing it, he sees either a 0 or a 1 enter into his visible unit.

Look me in the eye. Compound insect eyes are formed of lots of ommatidia.

You can think of his visible unit as being like a simple thresholded photodetector, which either doesn’t fire (is zero) or fires (is one), with the firing being triggered if the surrounding light is bright enough. In biological creatures, this type of vision unit might be similar to an ommatidium, which is a structure found in the vision systems of insects.

The EU we subject Cid to has the following properties. The ‘light’ entering into Cid’s visible unit changes 30 times per second, and is either 0 (i.e. dark) or 1 (light). The probability of the light being off we’ve arbitrarily set to 13.5%. By design, there are no correlations between subsequent events in this first experiment — there are no patterns in these sequences of light and dark.

The experiment we set up is as follows. We let Cid watch his EU for 1 hour — that’s 30*60*60=108,000 observations. During this time, his internal model is being trained. After this time is up, Cid ‘goes to sleep’, and dreams about his EU, where we generate 108,000 samples from the internal model he’s learned up to that point, and record the value of the visible unit for each. If his internal model is an accurate model of the EU, the probability of dreaming of dark generated by the internal model should be about 13.5%. We call one cycle of learning & sleep an epoch. We repeat this sequence for 14 epochs, which simulates the beginning of life for our new creature. For each sleeping period, we track the probability of the internal model dreaming of dark, and compute Cid’s $S_0$ number. As there is only only number that characterizes the EU, if the model can learn this number it has done its job, and this should be reflected in a large value for $S_0$.

Experiment #1: Results

The network performs well at all epochs in this case.

Here is a plot of the $S_0$ number for Cid over a period of 14 epochs. You can see that it’s very large and the value jumps around quite a bit. By looking at the actual probabilities his internal model generates, anything with an $S_0$ number of greater than about 100,000 is equivalent within statistical noise. So Cid’s brain is able to learn an excellent model of this EU, even after only one epoch of training, and doing more training, while it changes his brain configuration, doesn’t help much as all we see then is the effects of statistical fluctuations in the input data he’s seeing.

Here are the actual values of Cid’s network parameters after 14 epochs have concluded. This DBM has learned an excellent model for the EU in this experiment.

```weights = [array([[ 0.01813747,  0.00530222,  0.00424123,  0.00458312]], dtype=float32), array([[-0.00190055,  0.01029627,  0.00203666, 0.00552391], [ 0.00306078,  0.01077824,  0.00614936,  0.00418225], [ 0.00407983,  0.01392216,  0.02022623, -0.00406721], [ 0.00017785,  0.00020632,  0.009114  , -0.00081153]], dtype=float32)]
biases = [array([ 1.86096434]), array([-0.98480152, -1.03069905, -1.01625627, -0.97723272]), array([-1.01399916, -1.00536941, -1.01015504, -0.98526637])]
offsets = [array([ 0.86534613]), array([ 0.2712875 ,  0.26286839,  0.26594376,  0.27397069]), array([ 0.26666885,  0.26782288,  0.26658711,  0.27174567])]
```

Experiment #2. Adding motor output

Experiment #1 was great for getting a good understanding of how to train a DBM, and to start thinking about what it all means. But we want more! Here’s a very slight extension of Experiment #1, where we add a single new visible unit.

This new visible unit will represent a different type of thing. It will represent the direction of motion of Cid. Specifically, the new visible unit will have value 0 if Cid is moving to the left, and 1 if Cid is moving to the right.

With this new visible unit, Cid now has two visible units (one representing a ‘vision neuron’, and one representing a ‘motor neuron’). We’ll keep the hidden layers the same as before.

The network setup for experiment #2. It’s the same as before, except now there are two visible units. One is a vision neuron and the other is a motor neuron.

We’ll repeat the same basic experimental setup as the first one, where Cid first looks around at his EU for a while, and then dreams about it for a while, and repeats this for a few epochs. However because we’ve introduced a new type of visible unit we have some interesting issues arise.

Thinking of the motor neuron as a sensor AKA Avatar Mode

The first involves the training data. Now a piece of training data includes two bits — one is the visual input, and the other is a motor input. Understanding what a visual input is is pretty easy, but what’s a motor input? We usually think of motor as being an output — something we do, other than something that’s done to us. But here we’re going to imagine that during the time when Cid is awake, he’s actually being moved around by an external force. Suzanne calls this Avatar Mode — imagine you are controlling Cid’s motor output (say, by remote control), and the direction of his motor becomes the training data for the motor input.

For experiment #2, let’s assume that when Cid is learning and being ‘shown what to do’, he tends to be moved left when the light is off, and moved right when the light is on. What this means is that input data objects will tend to favor the bit states 00 (light is off, moving left) and 11 (light is on, moving right) over 01 (light is off, moving right) and 10 (light is on, moving left). If we again assume that there is no correlation in time between the states coming into the visible units, then this EU is fully characterized by three numbers — the probability of the vision neuron observing 0 and the motor neuron moving left; the probability of the vision neuron observing 0 and the motor neuron moving right; and the probability of the vision neuron observing 1 and the motor neuron moving right. The last possibility’s probability is fixed because these probabilities need to sum to one.

To track the learning we’ll again plot the $S_0$ number as a function of epoch. Recall that the $S_0$ number is the inverse of the KL-divergence, which compares the ‘true’ probability distribution found in the EU with the probability distribution generated by Cid’s freely running brain.

In our experiment, we’ll (again, arbitrarily) set $P_{00}=0.431, P_{11}=0.292, P_{01}=0.145, P_{10}=0.132$. These numbers fully characterize this EU.

Sampling with some subset of the visible units clamped

Now that we have visible units representing different types of thing (one vision, one movement), some new possibilities arise for investigating Cid’s behavior once his internal model has been trained. We now have the option of clamping some of the visible units and then asking what the remaining visible units are, by drawing samples from the internal model with some of the visible units clamped.

There are three different modes we can look at in experiment #2. The first is when we don’t clamp either visible unit, and we let the entire network run freely. Now the interpretation of the state of the motor neuron, when the network is run freely, is that the motor neuron is now an output — Cid is moving autonomously while his vision system is dreaming of dark and light.

The second mode is when we clamp the vision neuron to either light or dark and draw samples from the network to determine the state of the motor neuron. This is like Cid being awake, and we set the lights to whatever we like, and Cid moves autonomously based on what he’s learned about the correlations of light and movement during the training phase. This type of behavior is probably pretty similar to what we’ll want to do once we have an embodied version of Cid, and we want him to move around autonomously. Note that functionally this is the sort of behavior a lot of insects display — either attraction to or aversion to light.

The third mode is when we clamp the motor neuron and draw samples from the network to determine what Cid’s internal model ‘sees’ on his vision neuron based on the movement we’re forcing on him.

Experiment #2: Results

Here it doesn’t start out so well, but by the end the model is pretty good!

Here we see that the $S_0$ number doesn’t start very high, like it did in experiment #1. In fact after the first 5 epochs or so, it looks like Cid wasn’t getting a lot smarter. But all of a sudden around epoch #7 it looks like he finally started to ‘get it’! His internal model started effectively modeling what he was seeing. By the end of his training his understanding of this EU was comparable to his understanding of the simpler EU in experiment #1.

We can interpret this success as Cid’s having learned about the correlations between his visual input and his motor input — he was able to reach a full understanding of all there is to know about the EU in experiment #2.

Here’s the network parameters obtained after epoch #14.

```weights = [array([[-0.33172947,2.06676388,0.90290803], [ 1.94313753, -0.34757519,  2.05962396,  0.91246027]], dtype=float32), array([[-0.01521079,  0.007444  , -0.01788999,  0.01016755], [ 0.01918931,  0.00635643,  0.02085816, -0.01936374], [-0.00448515, -0.01009895, -0.00765508,  0.01153829], [-0.00094647, -0.00817871, -0.01026069, -0.0089241 ]], dtype=float32)]
biases = [array([-0.39935381, -0.29827521]), array([-0.72053314, -0.93864307, -0.67654053, -0.93797132]), array([-0.98044517, -1.01085684, -1.006881  , -0.98185932])]
offsets = [array([ 0.42313661,  0.43706216]), array([ 0.38207002,  0.28493565,  0.39176196,  0.30359983]), array([ 0.27248023,  0.26672524,  0.26748523,  0.27268372])]
```

Experiment #3. Explicitly adding time

In both of the previous experiments, the EU was chosen so that there were no correlations between subsequent inputs — there were no time-dependent patterns in the input bit strings Cid was being shown. Of course, in the EU humans are exposed to, there are such patterns. Let’s see if we can modify our DBM to be able to learn time-dependent patterns.

The way we’ll do this is pretty simple. Instead of having a single vision neuron and a single motor neuron, we’ll have two of each, where one pair represents the current observation and the other pair represents the immediately previous observation.

Here we still have two neurons — one vision and one motor — but we have two different times (here labeled t and t+1), and therefore the network has four visible units. We’ve grouped the vision and motor neurons together for a reason we’ll get to later!

An EU with time-dependent correlations

To test this architecture, let’s create an EU where there now are correlations in time between subsequent inputs. For a general 4 bit input, there are $2^4-1$ independent probabilities. Let’s set the probabilities of this EU to be $P_{0000}=0.30,P_{0001}=0.004,P_{0010}=0.008,P_{0011}=0.012,P_{0100}=0.016,P_{0101}=0.020,P_{0110}=0.024,P_{0111}=0.028,P_{1000}=0.032,P_{1001}=0.036,P_{1010}=0.040,P_{1011}=0.044,P_{1100}=0.048,P_{1101}=0.052,P_{1110}=0.056,P_{1111}=0.28$. These numbers are all pretty much arbitrary, although I made them different just to make sure I could tell if the model was capturing those differences, and I made the two states $0000$ and $1111$ most likely — these represent (a)  the situation where the vision neuron is reading dark and the motor is going left at time t, and the same thing happens at t+1, and (b) the situation where the vision neuron is reading light and the motor is going right at time t, and the same thing happens at t+1. So this EU favors dark/left and light/right and things staying the way they are in time. Again we’ll plot the $S_0$ number as a metric for how our learning is progressing.

A new type of thing we get from clamping some of the visible units — prediction

Imagine Cid takes in an observation of the current state of both its visible and motor neuron. Now if we draw a sample from his network with these clamped to the observed values, we obtain states for both at the next time step. These states are predictions about what Cid thinks should happen next, conditioned on his current observations. Isn’t that cool? His predictions are based on what he’s learned about the time dependence of his EU during the training phase.

Another interesting thing we can do is only clamp the visible unit to the currently observed light, and draw samples from the network. Cid will then move autonomously based on the current state of his motor neuron, and will also get a prediction about both what he should see and where he should move next.

Experiment #3: Results

Here the learning looks very linear.

As in the previous two experiments, we see that the $S_0$ number is increasing over the training period. Interestingly the rate of increase here looks linear, whereas in the previous two that was not the case. This could have something to do with the learning rate hyperparameter in the algorithm. It could be that the learning rate is too large for the first two and about right for this one.

Here are the final network parameters for Cid’s brain after epoch #14 for experiment #3.

```weights = [array([[ 0.68729705,  3.25865531,  1.52063966,  0.06205666], [ 1.29153585,  2.89045095,  1.39506388,  0.0642081 ], [ 1.45107603,  2.90390301,  1.613253  ,  0.0708109 ],  [ 1.65337193,  2.9249208 ,  1.90962994,  0.07118615]], dtype=float32), array([[-0.01320299,  0.0017458 ,  0.00971262, -0.01334796],  [ 0.00066732,  0.01005933,  0.02247284, -0.00862475],  [-0.00398598,  0.0030924 ,  0.02086396, -0.01616782], [-0.00087102, -0.00204386,  0.01797051,  0.00104328]], dtype=float32)]
biases = [array([ 0.29037783, -0.17516954, -0.47229764, -0.6564995 ]), array([-0.91211764,  3.56308915,  1.31385738, -1.03494297]), array([-1.01374428, -0.99906059, -0.95210618, -0.9959591 ])]
offsets = [array([ 0.5877514 ,  0.52328677,  0.49066822,  0.476114  ]), array([ 0.39119173,  0.68874638,  0.63992906,  0.26271362]), array([ 0.26620444,  0.2689028 ,  0.27874745,  0.27008831])]
```

Alright what have WE learned?

Cid has learned some things over the past few days. What have we learned?

Well it’s pretty clear that for the types of EU we’ve designed for Cid, even a very small DBM brain seems capable of reaching Enlightenment. This is kind of neat, especially when you consider that even for the more or less trivial cases we’ve been looking at, you can see how both sensor and actuator signals from a real embodied creature can be handled by the same framework. There is a clear way to enable autonomous behavior, where the machine entity makes its own decisions about what to do based on what it’s learned in the past. In addition, there is also a mechanism for ‘modeling the future’ which many folks believe (rightly, I think) is a key idea for understanding cognition.

Alright so next time we’ll take a look at how we might do the same types of learning, but using a Vesuvius processor… mmm quantum brains.

4 thoughts on “Three DBM experiments”

1. LOL not yet. Although you can use the same principles to learn to play other games — the folks at Deep Mind http://deepmind.com/ have done some cool stuff along these lines.

2. I like the time dependence trick, I haven’t seen that before. very cool! I agree your learning rate in exp 1 and 2 is probably too large, it looks like its oscillating around a minima and exp 3 suggests that the statistical fluctuations aren’t that significant. Maybe you can treat this hyperparameter as a model parameter and learn it from the data like the rest?