# Defining a Measure of Intelligence for Cid

This is what I imagine Miss Polly’s dolly would look like.

I have a childhood memory. I remember picking dandelions, and singing “Mommy had a baby and her head popped off”, while thumb-flick-decapitating the dandelion.

Yesterday this little slice of life came up in conversation with Suzanne. She had similar memories but thought it went “Miss Polly had a dolly and her head popped off.”

For the first time in about 35 years it occurred to me that what I remembered makes no sense. Why would mommy’s (or for that matter the baby’s, if you parse the sentence differently) head pop off? However a dolly’s head popping off is entirely sensible, and it rhymes!

Then I started to wonder if the elaborate network of memories I have around this important childhood memory might all just be fabricated. So I looked for the Miss Polly version, and lo and behold, in the UK, it’s exactly like she remembered. However where I grew up, it’s the Mommy version that dominates. I tried to find the origins of this and failed. It seems no-one knows where it came from. I’m speculating, but I suspect that the Miss Polly nursery rhyme is Very Old, probably originating in the UK sometime in the middle ages, and the version I remember is a mutation arising in North America. How it mutated into Mommy I could not source.

Isn’t it interesting to think that there are some True Facts About the World (such as the existence of both versions of this dandelion-unfriendly activity) that neither I nor Suzanne knew about, even though we knew enough to know one of them? In the earlier posts about Cid, I discussed the concept that we wanted his internal model of the universe (the one in his brain) to match as closely as possible his external universe. In this case, neither my nor Suzanne’s internal models matched the reality of the external universe accurately. In doing this research I feel that I’ve augmented my internal model a little bit. Now you can ask me anything about Miss Polly and her dolly and quickly regret doing so.

How intelligent is a thing?

When you try to build a thing, you need measures to determine how well you’re doing. This is really important. Often you need to choose between multiple paths forward, and being able to assign a set of numbers to ‘how good’ each design is allows you to make reasonable decisions about which paths are better. If we want to build intelligent machines, we need to reduce what we mean by ‘intelligence’ to a set of numbers. This means having a formal mathematical definition of what we mean when we say that word.

People who study intelligence have come up with large numbers of definitions of what the word means. Here’s a review paper from 2007 that contains about 70 of these. If you ask ‘by how much has my intelligence increased, now that I know a little bit more about Miss Polly?’, exactly zero of these answer this question. None of them are capable of producing numbers.

With biological entities living in the ‘real world’, it’s sensible that it would be very difficult to precisely define what we mean by intelligence. It’s just all so complicated! But we might be able to do this for Cid, and the reason is that we have the power to vastly simplify his Universe. And anyway, it’s a necessary condition of trying to build intelligent machines that we need to have a mathematical definition of intelligence. So let’s take a cut at this and see if we can come up with something sensible.

How Intelligent is Cid?

Let’s say we build two versions of Cid, both of which are exposed to exact copies of the same External Universe (we’ll use EU for short — this is the full and complete extent of the Universe that they can measure using their sensors) and both are doing something. We watch them doing whatever they are doing. Can we then measure which is more intelligent? How could we do this?

We could in principle do whatever we wanted to build Cid’s brain. However for now we’re going to restrict the type of brain we’re going to build to be a Boltzmann Machine of the sort we’ve been discussing in the previous posts. Boltzmann Machines are a type of generative model, which work by trying to match the probability distribution over states of the EU to the probability distribution over these states generated by the entity’s brain.

Here’s how we are going to quantify the idea of two probability distributions being ‘similar’. We’re going to use something called the Kullback–Leibler (or KL-) divergence. It is a measure of the information lost when one probability distribution (say the one inside Cid’s brain) is used to approximate another (say the real probability distribution of the EU). The KL-divergence can be used to define a quantitative intelligence measure for Cid.

Let’s define the probability distribution coming from Cid’s brain to be $P_B$, and the probability distribution from the EU to be $P_E$. Then the KL-divergence is

$D_{KL}(P_E || P_B) = \sum_m \log({{P_E(m)}\over{P_B(m)}}) P_E(m)$

where $m$ are the possible states of Cid’s Visible Units. We can formally define Cid’s intelligence to be the inverse of the KL-divergence, so as his model gets better his intelligence will increase, and will go to infinity as it nears perfect understanding of his EU.

It’s important that this definition of intelligence is explicitly defined relative to the entity’s EU, and in fact only means something when you keep that in mind. It’s a measure of how well the entity has been able to build an internal representation of what he’s capable of observing. Two entities can only be directly compared to each other using this metric if they have identical EUs. [As an aside, you can also use this to compare two different internal representations -- how 'similar' two Cid brains are to each other, which is very interesting in its own right].

Every jellyfish alive today has an unbroken line of successful lineage tracing all the way back to the primordial ooze. That specific fly you swatted today had parents, and they had parents, and so on, back for eons. All living creatures on the planet share this feature, and by any measure have been incredibly successful making a living doing something. Pretty mind blowing.

Generally when we think about intelligence, we have a prejudice that it’s something absolute, and clearly humans have more of whatever it is. The definition above challenges this position somewhat, in that we really need to take into account that different creatures can have dramatically different Universes in which they are submerged and dramatically different sensors that give them information about it. The EU of a 24-eyed jellyfish is very different from that of a two-eyed land-dwelling omnivorous hairless ape. Our prejudice is that our EU and our models of it constitute some kind of superior thing to the jellyfish’s — presumably the jellyfish’s are just a tiny subset of ours. Maybe this is true. But maybe not.

We’re going to refer to the variety of numerical quantifications of intelligence we’ll come up with as S-numbers. This is because the idea of coming up with a series of numbers to quantify the intelligence of the machines we’re building comes from Suzanne. This particular one we’ll call $S_0$.

KL-divergence and $S_0$ in the Grumpy Universe

The first EU we will show to Cid will be the Grumpy Universe. Recall that this Universe can be thought of as comprising a single bit $x$, where we as omnipotent gods get to arbitrarily set the probability of seeing the states of that bit. Let’s say that we choose the probability of the bit being zero (Cid opens his eyes and sees Grumpy Cat) to be $P_E(x = 0) = 0.135$. This of course fixes the only other possibility (the bit is one — Cid opens his eyes and sees Creepy Manbaby) to be $P_E(x=1) = 1- P_E(x=0) = 0.865$.

Once we have fixed these, we can write out the KL-divergence explicitly as

$D_{KL}(P_E || P_B) = \sum_m \log({{P_E(m)}\over{P_B(m)}}) P_E(m) = \log({{P_E(x=0)}\over{P_B(x=0)}}) P_E(x=0) + \log({{P_E(x=1)}\over{P_B(x=1)}}) P_E(x=1) =0.135 \log({{0.135}\over{P_B(x=0)}})+0.865 \log({{0.865}\over{1 - P_B(x=0)}})$

and the $S_0$ number is the reciprocal of this

$S_0 = {{1}\over{D_{KL}(P_E || P_B)}}$

The $S_0$ number diverges when the entity’s model of the EU is perfect. We’re going to call this state Enlightenment. The state of Enlightenment is always defined relative to a specific EU. Our objective will be to allow Cid to become Enlightened, in a series of increasingly complex EUs.

The entirety of Cid’s intelligence comes down to a single number — the probability of his internal model generating a zero when Cid is dreaming. Let’s see what the KL-divergence function looks like.

You can see that it goes to zero around the ‘correct’ value of 0.135, and is convex.

Cid’s Boltzmann Machine Brain

Recall that we chose a specific architecture for Cid’s brain, which consisted of eight Hidden Units and one Visible Unit. Here it is.

Here some of the weights are explicitly shown — all four U weights (connecting the visible unit to the hidden units) and three of the W weights are explicitly shown (the bold lines with the W next to them).

Cid starts by not having any way to know what any of the free parameters in this model should be. If we were to just randomly set all of them, and allow his brain to reach thermal equilibrium at a fairly low temperature, and we were to draw samples from the resultant probability distribution, the probability of the Visible Unit being zero will just be some random number between zero and one — he’s completely disconnected from his EU. So let’s say we were to do this and then measured this probability to be, say, 0.645. Looking at the chart for KL-divergence, this gives about 0.3, the inverse of which is about 3. So the $S_0$ number — the intelligence — of this random creature would be about 3.

Of course we don’t want to build creatures that don’t interact with their environments. We want them to learn from them. We want them to become Enlightened. And thankfully the Boltzmann Machine comes with a prescription for adjusting all of its parameters to decrease the KL-divergence (and thereby increase Cid’s intelligence). By following this prescription, Cid can become smarter by looking around at his world and increasingly understanding it.

In the next post, we’ll actually do the training! If we can succeed, Cid’s $S_0$ number will diverge and he’ll have complete and utter understanding of the Grumpy Universe.

# Sampling from a probability distribution

Roundabouts have also been scientifically proven to be too complex for the human mind to understand.

I find lots of things confusing. For example, I don’t understand Two and a Half Men. I tried watching it a few times. I don’t get it.

Another thing I find confusing is sampling from a probability distribution. I have always had trouble with probabilities. That whole Monty Hall thing really did me in for a while. But this is a really important concept, both for quantum computers and for the future cognitive power of the critters we’re trying to build. Because it’s confusing I’d like to talk about it a bit.

Let’s start by thinking about flipping a coin. Each time we flip the coin we either get heads or tails. (If something else happens, like it lands on its edge or something, we’ll just try again). We can think of the coin flip as being for all practical purposes random, and the probability of either outcome being 50% — half the time it’s heads, the other half tails.

Let’s try to build a mathematical model of this, and let’s try to do this using a Boltzmann distribution. Let’s call the value of the coin we see when we flip it x, and let’s say x=0 corresponds to heads, and x=1 corresponds to tails. Let’s call the ‘energy’ of each outcome $E(x)$. For reasons that hopefully will become clear, let’s make the energies of both scenarios the same — $E(0) = E(1)$.

Now let’s write down the mathematical equation for the Boltzmann distribution. It is

$P(x) = {{1}\over{\cal Z}} \exp(-E(x)/T)$

where ${\cal Z} = exp(-E(0)/T) + exp(-E(1)/T)$ is the partition function and $T$ is the temperature of the distribution.

If we write this out explicitly, we get

$P(x) = {{\exp(-E(x)/T)}\over{exp(-E(0)/T) + exp(-E(1)/T)}}$

A few things you can check at this point. One is that $P(0) + P(1) = 1$, which means the probability of seeing zero (heads) plus the probability of seeing one (tails) is equal to one. The other is that in the case where $E(0) = E(1)$ the probabilities of the two solutions are equal — $P(0) = P(1) = 0.5$ regardless of the actual values of $E(0)$ and $E(1)$.

The value $P(x)$ is the probability distribution over the variable $x$, and is defined by providing the energies of all of the possible values of $x$ (there are only two for a coin flip), and also a temperature $T$.

Drawing a sample from $P(x)$ means creating a random number, and assigning a value to $x$ (either zero or one) depending on the value of the random number. In the case of flipping a fair coin, a sample is the result of a coin toss and is either 0 (heads) or tails (1), and the probability of each outcome is 50%.

A more complicated situation

Let’s say that we now have two coins, and for some strange reason when we do a coin toss (now using both), we find that they more often have the same value (both heads or both tails) than opposite values (one heads, the other tails). This would be very peculiar to see with actual coins, but we’re just doing a thought experiment so bear with me. Let’s call the values of the coins $x_1$ and $x_2$.

What it means in our model for something to be more likely is that it has a lower energy. So what we need in our model is that the energies of the two outcomes where the variables are the same needs to be lower than the energies when they are different. In other words,

$E(x_1=0, x_2=0) = E(x_1=1, x_2=1) < E(x_1=0, x_2=1) = E(x_1=1, x_2=0)$.

If we write down our Boltzmann probability distribution, we get

$P(x_1, x_2) = {{\exp(-E(x_1, x_2)/T)}\over{exp(-E(0, 0)/T) + exp(-E(1, 0)/T) + exp(-E(0, 1)/T) + exp(-E(1, 1)/T)}}$

Now a sample from this probability distribution is a pair of bits $x_1, x_2$ that occur with probability $P(x_1, x_2)$. If you plug in the relations between these energies in this example, you should be able to convince yourself that $P(0,0) = P(1,1) > P(0,1) = P(1,0)$.

Now let’s connect to Cid’s proto-brain

In the previous post, we wrote down a similar probability distribution, except it was defined over nine ‘coins’ (that is, variables that could be either heads (zero) or tails (one)). Each of these lives on its own node — one on a Visible Unit, and eight on Hidden Units.

Exactly like in the previous simple examples, we start by defining the energies of all of the possible $2^9$ outcomes of tossing these nine coins. There are $2^9$ possibilities because any outcome from nine heads (the coins reading all zeros) to nine tails (the coins reading all ones) can occur. As before, lower energies will mean more probable outcomes.

We defined the energies of all $2^9$ states to be

$E(y, x_0, x_1, ..., x_7) = y a_0 + \sum_{k=0}^7 b_k x_k + \sum_{k=0}^3 y U_k x_k + \sum_{k,p \in {\cal E}} x_k W_{k,p} x_p$

where $a_0, b_k, U_k$ and $W_{k,p}$ were as yet to be determined real numbers. Given all of these, we can plug in any combination of our nine variables and get out a real number for $E$. Lower numbers mean that combination of variables is more probable.

Here’s where we are going with this idea. Cid’s External Universe consists of a single input, which can either be zero (he sees Grumpy Cat) or one (he sees Creepy Manbaby). The entirety of this Universe is captured by the probability of seeing each of these (this is the way we set it up). In order for Cid to ‘understand’ his Universe, we need to find settings of the parameters $a_0, b_k, U_k$ and $W_{k,p}$ such that the probability distribution of Cid’s visible unit when a sample is drawn from his internal representation matches the probability distribution Cid sees when he looks out into his External Universe. If we can achieve this, Cid has created an internal representation of his External Universe that is equivalent to actually looking out into the External Universe. He will have reached Enlightenment, and will no longer need to open his eyes.

In the next post, we’ll work through how we can make this happen, solely by Cid learning about his Universe.

# Boltzmann Machines for the Grumpy Universe

In the last post, we thought a bit about machine creatures, and in particular Cid, an unfortunate who we are going to torment quite a bit over the next few posts.

Today we’ll do a little bit of construction and deconstruction of Cid. We’re going to build him a brain, and try to see how it works, and whether we can get his brain to do what we want.

To proceed, we’re going to separate out the entirety of Cid’s Universe into three distinct parts. The first is the External Universe. This will consist of everything outside of Cid. The third is Cid’s brain, which will attempt to build a model of the External Universe, and will reside entirely within Cid. The second is the interface layer between the two, which in this context you can think of as an eye. This interface layer can accept information from the External Universe, whereas his brain cannot. The brain accepts information from the interface layer, and can also send information to the interface layer. Take a look at this picture. Hopefully the idea is clear!

Cid’s high level architecture.

Here’s another way of looking at the same thing that highlights the separation between Cid and the External Universe.

I think I like this one better as it emphasizes that Cid is contained and separate from the External Universe.

This segregation is really important and is tied to some real meaty issues. If you think of your own body and how it lives in your Universe, we have the same type of architecture. Your External Universe is roughly everything outside your skin; your interface layer is roughly everything on the outside of your body; and your internal model of the world is roughly everything inside your skin (probably mostly what’s inside your skull).

In the picture there is a small orange circle. We’ll call this a visible unit. (Now we’re starting to connect to a real Boltzmann Machine. Exciting!). You can think of the visible units as vertices in a graph. They are special in our architecture, in that they are able to ‘see’ into the External Universe, and are connected into Cid’s brain. Whenever you read ‘visible units’ in the context of Boltzmann Machines, think interface layer between the External Universe and the creature’s internal representation of it. It’s the layer that separates ‘outside the creature’ from ‘inside the creature’.

Inside Cid’s brain

So far we haven’t talked at all about what might be going on inside Cid’s brain. Let’s fix that, and build an actual brain that allows Cid to understand the Grumpy Universe.

Recall that the Grumpy Universe is a very silly place, where the External Universe consists of only two possible inputs (those being Grumpy Cat and Creepy Manbaby). Now instead of actually using the images themselves, let’s simplify things a bit and represent these by a zero (for Grumpy Cat) and a one (for Creepy Manbaby). So Cid’s interface layer will only ever see a zero (our stand-in for Grumpy Cat) or a one (for Creepy Manbaby).

To build Cid a brain, let’s do the following. Let’s set up a number of nodes, like the visible unit, but hidden. We’ll call these ones Hidden Units. Here’s a picture of what a possible Cid brain could look like.

Here we have one visible unit (the orange circle) and eight hidden units (the yellow circles).

From now on, we’ll just focus on the visible and hidden units to simplify things. Here they are.

A proto-brain for Cid.

Here we’ve added a couple of things. Each of the nodes now has a label. The visible units (of which now there is only one) we’ll label $v_k$ where $k$ an integer denoting which visible unit we’re referring to. The hidden nodes are labeled $h_k$ where $k$ is again an integer referring to a specific node. We’ve (arbitrarily) chosen eight hidden nodes.

We’ve also added some black lines that connect some, but not all, of the nodes together. The connectivity pattern shown above is just one of many different ones we could pick. This particular one will turn out to be quite useful for some things I want to show you, but we could just as well have allowed all to all connectivity.

Wherever there is a black line, we introduce a real number which we call a weight. In the proto-brain above, there are four of these between the visible unit and the hidden units, and 16 of them between the different hidden units. We’ll write the weights between the visible and hidden units as $U_k$, where $k = 0, 1, 2, 3$ depending on which hidden unit is connected to. We’ll write the weights between hidden units as $W_{k, p}$ where $k$ and $p$ are the indices of the hidden units the weight connects. Here’s a picture to help make this clearer.

Here some of the weights are explicitly shown — all four U weights (connecting the visible unit to the hidden units) and three of the W weights are explicitly shown (the bold lines with the W next to them).

Now let’s assume that each of the nodes can take on one of two values — say either zero or one (it could be -1 and +1 also — any two values will do). The total number of nodes in the current architecture is 1 (visible) + 8 (hidden) = 9. Since each of these nodes can have value 0 or 1, all nine of them together can be specified with nine bits. We’ll use the convention that the leftmost bit is the visible unit, and the rightmost eight bits are the hidden units. Let’s call the value of the visible unit $y$, and the values of the hidden units $x_k$, where $k=0..7$ refers to each of the eight hidden units.

We now define the probability of any particular state of our network to be

$P(y, x_0, x_1, ..., x_7) = {{1}\over{\cal Z}} \exp(-E(y, x_0, x_1, ..., x_7) / T)$

Where

$E(y, x_0, x_1, ..., x_7) = y a_0 + \sum_{k=0}^7 b_k x_k + \sum_{k=0}^3 y U_k x_k + \sum_{k,p \in {\cal E}} x_k W_{k,p} x_p$

The probability distribution $P$ is called a Boltzmann distribution (ergo the term ‘Boltzmann Machine’). The variable $T$ is the temperature of the distribution. The quantity

${\cal Z} = \sum_{all-possible-states} \exp(-E(y, x_0, x_1, ..., x_7))$

is called the partition function, and it’s pretty much impossible to calculate (it will turn out we don’t need to!).

I’ve introduced some parameters here — $a_0$ and $b_k$ are local biases on each of the nodes. They are (as yet unknown) real numbers, just like the $U_k$ and $W_{k,p}$ weights. The notation $\sum_{k,p \in {\cal E}}$ just means only sum over the $k, p$ pairs that have an edge between them.

OK that’s enough for today. Next post we’re going to start exercising that brain!

# Boltzmann Machines & distributions of patterns from the real world

There are many excellent overviews of Boltzmann Machines. Here’s one I particularly enjoyed — you should read it!

In this overview an important concept is raised. I’d like to talk about it a bit, as I think it’s quite important to understand before we jump into describing BMs. It’s related to a bunch of interesting problems in creating intelligent machines also.

Distributions of patterns from the real world

Imagine there is a strange creature that has evolved in a particularly weird environment. Let’s call it The Chortler in Darkness, or Cid for short.

Cid has evolved to open his eyes exactly once a second for 12 hours, and then sleep for exactly 12 hours. For reasons that are probably perfectly reasonable but beyond the ken of our feeble human brains, every time he opens his eyes he sees either the image on the left, or the image on the right. This is Cid’s Universe. For him, there is nothing but this.

Cid is born into its Grumpy world having no knowledge, context or understanding. However he has eyes (he’s able to see the above images), and an instinctual need to open them to look once a second.

Now let’s say he does this for the very first time, one second after he’s born. He opens his eyes, and he sees Grumpy Cat. He closes them, and then one second later, opens them again, and sees Creepy Manbaby. He keeps doing this, and after 12 hours, he’s seen Grumpy Cat 14,567 times and Creepy Manbaby 28,633 times.

Cid has a rudimentary sort of brain. The way this brain works is that when Cid is sleeping, its job is to generate what Cid sees when Cid is awake. So instead of the outside world feeding information into Cid’s brain, via his eyes, when Cid is asleep his brain generates the exact same type of information and pushes this out from his brain onto his eyes, one image per second. You can think of this as a kind of dreaming.

If Cid’s brain can generate the same distribution of patterns that Cid’s eyes see when he’s awake, Cid’s brain has built an effective model of the Universe in which he lives, and we can say that he understands his Universe. It’s important to understand that for Cid there is no physics, chemistry, biology, language or anything else — just two images appearing randomly with some probability.

When Cid opens his eyes once a second to look out at the Universe, what he is doing is sampling from a probability distribution over all possible things that he can see. In his case, there are only two possible things he can see, and since the probability of seeing something is 1 per sample, there is only one unknown, and that is the probability of seeing one of the two (the other probability is just 1 minus whatever that is). As the number of samples from the underlying probability distribution grows, we get more and more information about the ‘true’ probability. After 12 hours, Cid saw Grumpy Cat 33.7% of the time, so his brain, when it’s generating these patterns, should contain a probabilistic model that spews out a picture of Grumpy Cat about 33.7% of the time, and Creepy Manbaby 66.3% of the time.

The role of Cid’s brain is to learn a model of his peculiar world. If we draw samples from Cid’s internal representation of the Universe (residing in his brain), we hope to get the same answers as if we were to draw samples from the real world. If we can achieve this, then Cid’s brain’s model of the world — his internal representation of his Universe — is giving the exact same behavior as if he were looking out at the real world, and his internal representation is as powerful as the ‘real world’. He no longer needs to open his eyes — it’s all the same to him to just dream all the time.

Stepping Beyond the Grumpy Universe

The other SNL Universe.

We might feel sorry for Cid, because we’re pretty sure that the ‘real’ Universe is much more complicated than the Grumpy Universe, and Cid is missing out. Because we are Benevolent Gods, we might extract Cid from his comfortable eternal dreaming and plop him down in a Strange New Land. In the SNL Universe, instead of just two possible images, there are many more — say a thousand. But he still works the way he’s always worked — once a second for 12 hours he opens his eyes, and then sleeps for 12 hours. When he’s awake, he gathers information about how many times he sees each of the images. This information is used to create an internal probabilistic model in his brain that attempts to match the probability distribution he sees in the SNL. When he’s asleep, this model generates images, and the closer this model gets to the true distribution of the SNL Universe, the closer he gets to Enlightenment and full comprehension.

Now in this case, it will take longer to get there — not only do we need to see all of the patterns at least once, we need to seem them enough to get a pretty good statistical measure of their likelihoods. But with a thousand possible patterns, it’s likely that Cid will eventually reach the point where dreaming and the SNL real world are indistinguishable. In this Enlightened state, Cid will have transcended the need to open his eyes.

The Human Experience

Some say this is Cid.

So far Cid has lived a pretty silly existence, and in fact (I forgot to mention this) he actually looks fairly silly also.

Now let’s say that instead of just a thousand images, every time Cid opens his eyes he could see any possible natural image — that is, any image that a human eye could see. He still does the same thing — every second for 12 hours he opens his eyes and looks at one of these, and then for 12 hours generates one every second from his internal model.

He keeps doing this until his internal model matches what he sees in the Real World. If he can do that, then he’s developed an internal representation of images in the Real World, and can generate them in a way that’s indistinguishable from actually opening his eyes and looking at natural images.

Interestingly, there are Cid-like creatures in the world already. Unfortunately, just being able to understand a Universe of natural images isn’t nearly enough to create a creature with human like cognition. But the progress in understanding how creatures can develop internal representations of parts of our Real World is real progress towards that objective I think.

# Quantum Boltzmann Machines

Sometimes I think about top ten lists. Like what my top ten favorite songs of all time would be, or my top ten favorite books. You can probably tell a lot about someone by what would be on those lists. I once did a personality profiling procedure, which took the answers to about one hundred multiple choice questions and put the respondent into one of 25 72 bins. [Note: I just found the document, there were 72 bins. It was called the Insights Discovery Profile. I was in Bin #22, AKA "Reforming Director"]. The bin I was in was an eerily accurate description of me. I think of this procedure as ‘human dimensionality reduction’. It’s like PCA!

One of my top ten books is On Intelligence by Jeff Hawkins. Jeff was the founder of Palm and Handspring. If you’re not in a hardware company, there is an important fact I will share with you of which you may be unaware. Building hardware that works in the real world is a special kind of hell. I suspect Dante originally had a level of the Inferno where you had to just make carts or butter churns or whatever, but it was so painful to think about he took it out. So I feel a sort of esprit de corps with anyone who has suffered through that special kind of torture.

I first read On Intelligence in 2008, and it was my first exposure to two important ideas. They are:

1. Intelligence is related to, and can even be defined to be, an entity’s ability to build an internal representation of the world, and correctly predict the outcomes of its possible actions within that model.
2. Mammalian brains contain a structure, called the neocortex, that allows mammals to build models of the world out of the same repeating physical structure, tiled out a huge number of times. This repeating structure is hierarchical, and allows mammals to efficiently build representations of the world that are also hierarchical.

In my next series of posts I want to show you something related to both of these points. It’s a fascinating way to connect what D-Wave hardware does to the bleeding edge of machine learning.

# Lockheed Martin Tweet Chat: #QuantumChat

Lockheed Martin is hosting an interesting event, which is linked to here. It’s an opportunity to talk to people who are working with actual D-Wave quantum computers. If you have questions, now you can have them answered by people who actually know what they are talking about. What a concept! Exciting! Here’s a brief summary, from the event page linked to above:

Join quantum computing experts from Lockheed Martin, the University of Southern California and D-Wave Systems as they “borrow” their companies’ Twitter accounts to discuss the latest in speedy qubits and the quantum evolution.

Tweet your questions to @LockheedMartin, @USCViterbi or @dwavesys with the #QuantumChat hashtag starting Nov. 7. @LockheedMartin will moderate the chat and pose questions beginning at 1 p.m. EDT on Thursday, Nov. 14. Questions will be selected from those tweeted with the #QuantumChat hashtag between now and the end of the chat.

Can’t follow along with the Tweet Chat live? Watch for the full chat transcript on our Storify page.

# D-Wave on NOVA

We were on NOVA’s Making Stuff program last week. Here’s the segment where we appeared. It’s pretty … cool.

# D-Wave Systems and Cypress Announce Partnership

Cypress Semiconductor Corporation is a semi- conductor design and manufacturing company founded in 1982 by T. J. Rogers and others from AMD. We’re working with them to build bigger and better superconducting circuits — ultimately millions of qubits and billions of devices per chip. The biggest problem we (or any QC effort that follows us) faces is the manufacturability of designs, and Cypress has one of the most incredible fabrication operations I have ever seen. You can see an overhead shot of the Minnesota facility to the right. This doesn’t do it justice. Acres of fab machines. And the people are top rate. Very exciting.

From HPCWire:

SAN JOSE, Calif. & BURNABY, British Columbia — Cypress Semiconductor Corp. and D-Wave Systems Inc., the world’s first commercial quantum computing company, today announced that D-Wave has successfully transferred its proprietary process technology for building quantum computing microprocessors to Cypress’s Wafer Foundry located in Bloomington, Minnesota. D-Wave selected Cypress as its foundry and started the site change in January of 2013, and Cypress delivered first silicon on June 26. Results from this lot indicate better yields than D-Wave has achieved in the past, validating the quality of Cypress’s production-scale environment.

# The Creative Destruction Lab at the University of Toronto

Ajay Agrawal is the Peter Munk Professor of Entrepreneurship at the Rotman School of Management at the University of Toronto. He is also an old friend. We both went through Haig Farris’ High Tech Entrepreneurship course at UBC. We stayed in touch through the years. Ajay even wrote a Harvard Business School case study on D-Wave when he was there.

In 2012 Ajay started a business incubator at U of T called the Creative Destruction Lab (CDL). You can read about it here. They have been very successful. One of the companies that went through the process is Thalmic Labs. Their initial product, the Myo, sold 25,000 units in the first month after its release.

The CDL runs a model where a group of seven successful entrepreneurs mentors and guides the participants through a very competitive process that focuses resources on companies that look like winners. This group is called the G7. It is a little like the Dragons on Dragons Den.

A couple of months ago, Ajay asked if I would be interested in being a member of the G7 this year. Of course I said yes. It’s a very cool opportunity to help along the next Elon Musk or Marissa Meyer. As the year progresses, I’ll let yall know about some of the interesting companies in this year’s cohort.

# We do it because we must

Google produced a most excellent video introducing some of the folks working at the Quantum Artificial Intelligence Lab. Here it is!

You can see some cool shots of our new facility in the piece, like this one.

There are more that a dozen of these machines doing interesting things now. They are crunching away on everything from basic physics experiments, probing entanglement on scales that humans have never been able to before, to commercial applications of machine learning — like the wink detector in the Google Glass product.

There are some great memes in the video. One of my favorites was raised by Sergio Boixo. He says at 4:25,  ‘… [this machine] teaches us that we shouldn’t be naive about the world, and we shouldn’t think about the world as a simple machine. It forces us to consider more sophisticated notions of how the reality around us is actually shaped.’

This reminded me of a great bit that Neil DeGrasse Tyson did. Here he discusses what he calls ‘A fascinatingly disturbing thought’. Here’s a quote (the full version is here):

I lay awake at nights wondering whether simply we as a species are simply too stupid to figure out the Universe that we are investigating, and maybe we need some other species one percent smarter than we are, for which string theory would be intuitive, for which all the greatest mysteries of the Universe, from dark matter, dark energy, the origins of life, and all the frontiers of our thought, would be something that they would just self-intuit. I’m jealous of that possibility. Because I want to be around for those discoveries.

I feel a lot of sympathy for this position. It got me thinking that waiting for the “real one percenters” to show up might be dangerous. Maybe it’s a better strategy to try to create them. If only we had some type of quantum artificial intelligence…