There are many excellent overviews of Boltzmann Machines. Here’s one I particularly enjoyed — you should read it!

In this overview an important concept is raised. I’d like to talk about it a bit, as I think it’s quite important to understand before we jump into describing BMs. It’s related to a bunch of interesting problems in creating intelligent machines also.

**Distributions of patterns from the real world**

Imagine there is a strange creature that has evolved in a particularly weird environment. Let’s call it The Chortler in Darkness, or Cid for short.

Cid has evolved to open his eyes exactly once a second for 12 hours, and then sleep for exactly 12 hours. For reasons that are probably perfectly reasonable but beyond the ken of our feeble human brains, every time he opens his eyes he sees either the image on the left, or the image on the right. This is Cid’s Universe. For him, there is nothing but this.

Cid is born into its Grumpy world having no knowledge, context or understanding. However he has eyes (he’s able to see the above images), and an instinctual need to open them to look once a second.

Now let’s say he does this for the very first time, one second after he’s born. He opens his eyes, and he sees Grumpy Cat. He closes them, and then one second later, opens them again, and sees Creepy Manbaby. He keeps doing this, and after 12 hours, he’s seen Grumpy Cat 14,567 times and Creepy Manbaby 28,633 times.

Cid has a rudimentary sort of brain. The way this brain works is that when Cid is sleeping, its job is to **generate** what Cid sees when Cid is awake. So instead of the outside world feeding information into Cid’s brain, via his eyes, when Cid is asleep his brain generates the exact same type of information and pushes this out from his brain onto his eyes, one image per second. You can think of this as a kind of **dreaming**.

If Cid’s brain can generate the same distribution of patterns that Cid’s eyes see when he’s awake, Cid’s brain has built an effective model of the Universe in which he lives, and we can say that he understands his Universe. It’s important to understand that for Cid there is no physics, chemistry, biology, language or anything else — just two images appearing randomly with some probability.

When Cid opens his eyes once a second to look out at the Universe, what he is doing is **sampling from a probability distribution over all possible things that he can see**. In his case, there are only two possible things he can see, and since the probability of seeing something is 1 per sample, there is only one unknown, and that is the probability of seeing one of the two (the other probability is just 1 minus whatever that is). As the number of samples from the underlying probability distribution grows, we get more and more information about the ‘true’ probability. After 12 hours, Cid saw Grumpy Cat 33.7% of the time, so his brain, when it’s generating these patterns, should contain a probabilistic model that spews out a picture of Grumpy Cat about 33.7% of the time, and Creepy Manbaby 66.3% of the time.

The role of Cid’s brain is to learn a model of his peculiar world. If we draw samples from Cid’s internal representation of the Universe (residing in his brain), we hope to get the same answers as if we were to draw samples from the real world. If we can achieve this, then Cid’s brain’s model of the world — his internal representation of his Universe — is giving the exact same behavior as if he were looking out at the real world, and his internal representation is as powerful as the ‘real world’. He no longer needs to open his eyes — it’s all the same to him to just dream all the time.

**Stepping Beyond the Grumpy Universe**

We might feel sorry for Cid, because we’re pretty sure that the ‘real’ Universe is much more complicated than the Grumpy Universe, and Cid is missing out. Because we are Benevolent Gods, we might extract Cid from his comfortable eternal dreaming and plop him down in a Strange New Land. In the SNL Universe, instead of just two possible images, there are many more — say a thousand. But he still works the way he’s always worked — once a second for 12 hours he opens his eyes, and then sleeps for 12 hours. When he’s awake, he gathers information about how many times he sees each of the images. This information is used to create an internal probabilistic model in his brain that attempts to match the probability distribution he sees in the SNL. When he’s asleep, this model generates images, and the closer this model gets to the true distribution of the SNL Universe, the closer he gets to Enlightenment and full comprehension.

Now in this case, it will take longer to get there — not only do we need to see all of the patterns at least once, we need to seem them enough to get a pretty good statistical measure of their likelihoods. But with a thousand possible patterns, it’s likely that Cid will eventually reach the point where dreaming and the SNL real world are indistinguishable. In this Enlightened state, Cid will have transcended the need to open his eyes.

**The Human Experience**

So far Cid has lived a pretty silly existence, and in fact (I forgot to mention this) he actually looks fairly silly also.

Now let’s say that instead of just a thousand images, every time Cid opens his eyes he could see any possible natural image — that is, any image that a human eye could see. He still does the same thing — every second for 12 hours he opens his eyes and looks at one of these, and then for 12 hours generates one every second from his internal model.

He keeps doing this until his internal model matches what he sees in the Real World. If he can do that, then he’s developed an internal representation of images in the Real World, and can generate them in a way that’s indistinguishable from actually opening his eyes and looking at natural images.

Interestingly, there are Cid-like creatures in the world already. Unfortunately, just being able to understand a Universe of natural images isn’t nearly enough to create a creature with human like cognition. But the progress in understanding how creatures can develop internal representations of parts of our Real World is real progress towards that objective I think.

Whole-heartedly agreed!

Really nice write-up Geordie, absolutely love all the posts on this site. It keeps me inspired to continue diving deeper and deeper down this very path. There aren’t too many fields of work out there as exciting as this IMO :-)

Cheers!

Thanks!!! I agree this stuff is very exciting… hopefully we can build some real cool stuff in the not too distant future!

Really enjoyed this post – you’ve made the concept accessible to a layman.

Thanks Mike! This stuff gets complicated fast and often some of the ‘foundational concepts’ get passed over pretty quick. Part of the issue with machine learning vs. human learning is that the Universes of all the machines that have ever been built are exceedingly tiny and peculiar. Even though the Grumpy Universe is silly, even the most ambitious learning projects aren’t that far from it. We’ve only shown these new machine creatures a tiny sliver of our ‘Real’ Universe. If we want them to be like us, they need to see the whole thing through senses like ours, and they need to be able to act on it via actuators like ours.

By predicting that the next image will be whatever the last image was, Cid can reproduce any distribution essentially perfectly, yet even in the simplest case, with a 2 to 1 ratio of the two images presented in a repeating pattern ABBABB*, Cid would still be wrong in his predictions 2/3rds of the time. It’s not just overall distributions that matter but “phase”.

*Something construction workers used to say a lot, for some reason.

Hi EH! The world we’ve constructed for Cid has no such correlations by construction. If there are correlations then those patterns would have to be learnable by Cid’s brain in order for him to understand his universe, but they are absent at the current time. Yes you could do something like what you’re suggesting (memorize all the bits you see during the day and then just play them back). However as the distributions become more complex this strategy will fail — for a variety of reasons only learning concrete examples is not a robust strategy (I’m going to go on at length about this at some point).