In the last post, we thought a bit about machine creatures, and in particular Cid, an unfortunate who we are going to torment quite a bit over the next few posts.

Today we’ll do a little bit of construction and deconstruction of Cid. We’re going to build him a brain, and try to see how it works, and whether we can get his brain to do what we want.

**Thinking about our system architecture
**

To proceed, we’re going to separate out the entirety of Cid’s Universe into three distinct parts. The first is the External Universe. This will consist of everything outside of Cid. The third is Cid’s brain, which will attempt to build a model of the External Universe, and will reside entirely within Cid. The second is the interface layer between the two, which in this context you can think of as an eye. This interface layer can accept information from the External Universe, whereas his brain cannot. The brain accepts information from the interface layer, and can also send information to the interface layer. Take a look at this picture. Hopefully the idea is clear!

Here’s another way of looking at the same thing that highlights the separation between Cid and the External Universe.

This segregation is really important and is tied to some real meaty issues. If you think of your own body and how it lives in your Universe, we have the same type of architecture. Your External Universe is roughly everything outside your skin; your interface layer is roughly everything on the outside of your body; and your internal model of the world is roughly everything inside your skin (probably mostly what’s inside your skull).

In the picture there is a small orange circle. We’ll call this a visible unit. (Now we’re starting to connect to a real Boltzmann Machine. Exciting!). You can think of the visible units as vertices in a graph. They are special in our architecture, in that they are able to ‘see’ into the External Universe, and are connected into Cid’s brain. Whenever you read ‘visible units’ in the context of Boltzmann Machines, think interface layer between the External Universe and the creature’s internal representation of it. It’s the layer that separates ‘outside the creature’ from ‘inside the creature’.

**Inside Cid’s brain**

So far we haven’t talked at all about what might be going on inside Cid’s brain. Let’s fix that, and build an actual brain that allows Cid to understand the Grumpy Universe.

Recall that the Grumpy Universe is a very silly place, where the External Universe consists of only two possible inputs (those being Grumpy Cat and Creepy Manbaby). Now instead of actually using the images themselves, let’s simplify things a bit and represent these by a zero (for Grumpy Cat) and a one (for Creepy Manbaby). So Cid’s interface layer will only ever see a zero (our stand-in for Grumpy Cat) or a one (for Creepy Manbaby).

To build Cid a brain, let’s do the following. Let’s set up a number of nodes, like the visible unit, but hidden. We’ll call these ones Hidden Units. Here’s a picture of what a possible Cid brain could look like.

From now on, we’ll just focus on the visible and hidden units to simplify things. Here they are.

Here we’ve added a couple of things. Each of the nodes now has a label. The visible units (of which now there is only one) we’ll label where an integer denoting which visible unit we’re referring to. The hidden nodes are labeled where is again an integer referring to a specific node. We’ve (arbitrarily) chosen eight hidden nodes.

We’ve also added some black lines that connect some, but not all, of the nodes together. The connectivity pattern shown above is just one of many different ones we could pick. This particular one will turn out to be quite useful for some things I want to show you, but we could just as well have allowed all to all connectivity.

Wherever there is a black line, we introduce a real number which we call a **weight**. In the proto-brain above, there are four of these between the visible unit and the hidden units, and 16 of them between the different hidden units. We’ll write the weights between the visible and hidden units as , where depending on which hidden unit is connected to. We’ll write the weights between hidden units as where and are the indices of the hidden units the weight connects. Here’s a picture to help make this clearer.

Now let’s assume that each of the nodes can take on one of two values — say either zero or one (it could be -1 and +1 also — any two values will do). The total number of nodes in the current architecture is 1 (visible) + 8 (hidden) = 9. Since each of these nodes can have value 0 or 1, all nine of them together can be specified with nine bits. We’ll use the convention that the leftmost bit is the visible unit, and the rightmost eight bits are the hidden units. Let’s call the value of the visible unit , and the values of the hidden units , where refers to each of the eight hidden units.

We now define the probability of any particular state of our network to be

Where

The probability distribution is called a** Boltzmann distribution** (ergo the term ‘Boltzmann Machine’). The variable is the temperature of the distribution. The quantity

is called the partition function, and it’s pretty much impossible to calculate (it will turn out we don’t need to!).

I’ve introduced some parameters here — and are local biases on each of the nodes. They are (as yet unknown) real numbers, just like the and weights. The notation just means only sum over the pairs that have an edge between them.

OK that’s enough for today. Next post we’re going to start exercising that brain!

This is great! Although, poor old Cid. You could have given him something better to look at than Creepy Manbaby!

Hi Geordie, thanks for the great post! But i’m not sure i get why you need the bias terms in the energy equation. If they’re there to tune the importance of a given node it seems like the weights already take care of this. For example if all of the weights to a given node are zero, then wouldn’t this be equivalent to setting the bias term for that node to zero? In fact they seem somewhat undesirable as you could have unconnected nodes with non-zero bias terms that will effect your probability distribution…

Hi Jimmy! I’m not sure! I have a feeling that the representational power of an energy term with both biases and coupling terms is greater than one with just coupling terms. We can always drop them later if they are causing issues. I don’t think it’s a problem if unconnected nodes affect the probability distribution? It might be a little un-brain-y but the basic idea is the same.

As Geordie pointed out, it’s a case of higher representational power. The bias term encodes the affinity of a single node to be a particular type. But there are competing pressures from neighboring nodes.