# Reaching Enlightenment I.

Cid’s brain is an instance of a Deep Boltzmann Machine (DBM). In a DBM, the connections amongst the visible and hidden units have a particular structure. This structure removes connections from a fully connected model such that layers in the network can be naturally defined. In a DBM, each layer has no connections amongst its units. Each unit in a layer is connected to every unit in both the layers immediately above and immediately below. The DBM type structure is the middle one in the picture here.

Different kinds of Boltzmann Machines. The visible units are grey, the hidden units are white. Picture from this paper.

There are at least three good reasons for imposing this type of restriction, instead of just allowing all units to talk to all other units (this variant is called a fully connected Boltzmann Machine — that’s the one on the left in the picture):

1. Computational tractability. In the layered version, you can re-draw the graph by grouping all vertices from the odd layers on one side, and all the vertices from the even layers on the other side, creating a bipartite graph (no interconnections exist on either side). This means that each side is conditionally independent given the values of the vertices on the other side.
2. In the layered version, it is reasonable that hierarchical patterns will emerge, where local properties of the data are captured at low levels, and more global properties are captured at higher levels.
3. Mammalian brains have highly restricted connectivities, and DBMs could be a plausible model of part of what’s going on in our neocortex.

Here some of the weights are explicitly shown — all four U weights (connecting the visible unit to the hidden units) and three of the W weights are explicitly shown (the bold lines with the W next to them).

The connectivity structure for Cid’s brain is the one in the picture to the right. In this architecture, there are two separate hidden layers of the sort described above — the set $\{h_0, h_1, h_2, h_3\}$ comprises the first hidden layer, and the set $\{h_4, h_5, h_6, h_7\}$ comprises the second (and in this case, top) hidden layer. Each set has no internal connections, and each variable in each set is connected to all variables immediately above and immediately below, as required for a DBM. There is only one visible unit.

The External Universe (EU) Cid will be allowed to see consists only of a random uncorrelated sequence of bits, where the probability of each bit being zero is set by the gods of this EU (that means us). This EU is as simple as it gets. If Cid can learn what that probability is, just by observation, he will have learned everything there is to know about his EU.

Cid’s Brain is Horribly Over-Parametrized, But That’s OK

The DBM we’re going to use has a total of 9 biases, 4 connections between the visible and first hidden layer, and sixteen connections between the first and second hidden layers, for a total of 29 free parameters. In the EU we’re going to show him first, there is only one parameter to learn (the probability of the input bit being zero). So our model is Horribly Over-Parametrized.

One consequence of this is that I can, as the omnipotent god that defined and set in motion this EU, figure out just by observation what an Enlightened Cid could look like. We can just set all the parameters except the bias on the visible unit to zero, and set the bias on the visible unit to generate the desired probability. If I do this, then our full-blown energy function

$E(y, x_0, x_1, ..., x_7) = y a_0 + \sum_{k=0}^7 b_k x_k + \sum_{k=0}^3 y U_k x_k + \sum_{k,p \in {\cal E}} x_k W_{k,p} x_p$

becomes

$E(y, x_0, x_1, ..., x_7) = y a_0$

and the probability of the visible unit being zero is

$P(y=0) = {{\exp(-E(y=0)/T)}\over{exp(-E(y=0)/T) + exp(-E(y=1)/T)}} = {{1}\over{1 + exp(-a_0/T)}}$

So if we set $a_0 = -T \ln (1/P(y=0) -1)$ Cid’s brain will perfectly understand the EU, and he will be Enlightened. In the specific example I introduced earlier where $P(y=0) = 0.135$, this would be $a_0 = -T \ln(1.0.135 -1 ) = - 1.857 T$. If we (arbitrarily) set the temperature $T = 1$, then $a_0 = -1.857$ and all the other parameters zero is an Enlightened Boltzmann Brain for the Grumpy Universe.

That Was Cheating

The procedure I outlined above, even though it leads to a perfectly acceptable final solution, is not acceptable as a process. The reason for this is that we need to look at things from Cid’s perspective, not ours. We have knowledge and context that he doesn’t. Cid doesn’t know about the structure of his brain, or that there’s a number called $b_4$ that represents the bias on his fourth hidden node. He doesn’t know about Boltzmann statistics, or the fundamental nature of his External Universe. All he has is the contraption we gave him in his head, a rudimentary vision system, and some signals coming into his vision system.

This point is important when thinking about building intelligent machines. It’s easy to fall into a ‘functionalist’ pattern, where we do our best to solve a machine learning or artificial intelligence problem by piling as much of our expert knowledge and context into the system as possible, and somehow assuming that the system should be able to infer other properties of the world from what we’ve shown it, when it’s pretty clear it can’t. Most of historical AI is like this. You can’t show a machine a bunch of text or still images and then have a human-like creature emerge on the other side. All you will get is a creature that understands text or images (maybe better than any human). That’s not the same as understanding the world that generated the text/images, like humans do. The view of the folks on our side is that if you want to build creatures like humans, they need to be able to create models of the same type of EU we have, which means closely replicating the particular senses we have, and allowing them to move around this EU like we do.

The most you can ever hope to achieve when you build a machine creature of Cid’s type is Enlightenment in the context of the EU you show the machine. If you want to build a machine that’s human-like, you need to provide it with the same, or very similar, EU we see. The only way to do that is to embody your creature in a physical body that has sensors and actuators that closely match ours.

Now on the other hand, if you want an alien intelligence that is Enlightened vs. some other EU (like the Grumpy Universe), then you can get away with doing everything in software. That will lead to intelligent creatures, but of a different sort of intelligence than we have.

OK so we cheated when we created an Enlightened Cid. How do we do it without cheating?

What I Mean by Not Cheating

What I want to be able to do is have Cid learn about his EU solely from observations coming in through his visible unit, without any divine intervention. At this stage, we will need more compute resources than simply the DBM we designed for him. At some point it will be possible to remove these external resources if we want. But for now we’ll live with them, and be explicit about where they are and what they are being used to do.

What We’ll do Next Time

I’d planned on running through the whole training exercise in this post, but I’ve decided to split it in two because going through how the training works is a little lengthy. The two references I’ll be working from are this and this if you’d like to do some reading in advance.