# Defining a Measure of Intelligence for Cid

This is what I imagine Miss Polly’s dolly would look like.

I have a childhood memory. I remember picking dandelions, and singing “Mommy had a baby and her head popped off”, while thumb-flick-decapitating the dandelion.

Yesterday this little slice of life came up in conversation with Suzanne. She had similar memories but thought it went “Miss Polly had a dolly and her head popped off.”

For the first time in about 35 years it occurred to me that what I remembered makes no sense. Why would mommy’s (or for that matter the baby’s, if you parse the sentence differently) head pop off? However a dolly’s head popping off is entirely sensible, and it rhymes!

Then I started to wonder if the elaborate network of memories I have around this important childhood memory might all just be fabricated. So I looked for the Miss Polly version, and lo and behold, in the UK, it’s exactly like she remembered. However where I grew up, it’s the Mommy version that dominates. I tried to find the origins of this and failed. It seems no-one knows where it came from. I’m speculating, but I suspect that the Miss Polly nursery rhyme is Very Old, probably originating in the UK sometime in the middle ages, and the version I remember is a mutation arising in North America. How it mutated into Mommy I could not source.

Isn’t it interesting to think that there are some True Facts About the World (such as the existence of both versions of this dandelion-unfriendly activity) that neither I nor Suzanne knew about, even though we knew enough to know one of them? In the earlier posts about Cid, I discussed the concept that we wanted his internal model of the universe (the one in his brain) to match as closely as possible his external universe. In this case, neither my nor Suzanne’s internal models matched the reality of the external universe accurately. In doing this research I feel that I’ve augmented my internal model a little bit. Now you can ask me anything about Miss Polly and her dolly and quickly regret doing so.

How intelligent is a thing?

When you try to build a thing, you need measures to determine how well you’re doing. This is really important. Often you need to choose between multiple paths forward, and being able to assign a set of numbers to ‘how good’ each design is allows you to make reasonable decisions about which paths are better. If we want to build intelligent machines, we need to reduce what we mean by ‘intelligence’ to a set of numbers. This means having a formal mathematical definition of what we mean when we say that word.

People who study intelligence have come up with large numbers of definitions of what the word means. Here’s a review paper from 2007 that contains about 70 of these. If you ask ‘by how much has my intelligence increased, now that I know a little bit more about Miss Polly?’, exactly zero of these answer this question. None of them are capable of producing numbers.

With biological entities living in the ‘real world’, it’s sensible that it would be very difficult to precisely define what we mean by intelligence. It’s just all so complicated! But we might be able to do this for Cid, and the reason is that we have the power to vastly simplify his Universe. And anyway, it’s a necessary condition of trying to build intelligent machines that we need to have a mathematical definition of intelligence. So let’s take a cut at this and see if we can come up with something sensible.

How Intelligent is Cid?

Let’s say we build two versions of Cid, both of which are exposed to exact copies of the same External Universe (we’ll use EU for short — this is the full and complete extent of the Universe that they can measure using their sensors) and both are doing something. We watch them doing whatever they are doing. Can we then measure which is more intelligent? How could we do this?

We could in principle do whatever we wanted to build Cid’s brain. However for now we’re going to restrict the type of brain we’re going to build to be a Boltzmann Machine of the sort we’ve been discussing in the previous posts. Boltzmann Machines are a type of generative model, which work by trying to match the probability distribution over states of the EU to the probability distribution over these states generated by the entity’s brain.

Here’s how we are going to quantify the idea of two probability distributions being ‘similar’. We’re going to use something called the Kullback–Leibler (or KL-) divergence. It is a measure of the information lost when one probability distribution (say the one inside Cid’s brain) is used to approximate another (say the real probability distribution of the EU). The KL-divergence can be used to define a quantitative intelligence measure for Cid.

Let’s define the probability distribution coming from Cid’s brain to be $P_B$, and the probability distribution from the EU to be $P_E$. Then the KL-divergence is

$D_{KL}(P_E || P_B) = \sum_m \log({{P_E(m)}\over{P_B(m)}}) P_E(m)$

where $m$ are the possible states of Cid’s Visible Units. We can formally define Cid’s intelligence to be the inverse of the KL-divergence, so as his model gets better his intelligence will increase, and will go to infinity as it nears perfect understanding of his EU.

It’s important that this definition of intelligence is explicitly defined relative to the entity’s EU, and in fact only means something when you keep that in mind. It’s a measure of how well the entity has been able to build an internal representation of what he’s capable of observing. Two entities can only be directly compared to each other using this metric if they have identical EUs. [As an aside, you can also use this to compare two different internal representations -- how 'similar' two Cid brains are to each other, which is very interesting in its own right].

Every jellyfish alive today has an unbroken line of successful lineage tracing all the way back to the primordial ooze. That specific fly you swatted today had parents, and they had parents, and so on, back for eons. All living creatures on the planet share this feature, and by any measure have been incredibly successful making a living doing something. Pretty mind blowing.

Generally when we think about intelligence, we have a prejudice that it’s something absolute, and clearly humans have more of whatever it is. The definition above challenges this position somewhat, in that we really need to take into account that different creatures can have dramatically different Universes in which they are submerged and dramatically different sensors that give them information about it. The EU of a 24-eyed jellyfish is very different from that of a two-eyed land-dwelling omnivorous hairless ape. Our prejudice is that our EU and our models of it constitute some kind of superior thing to the jellyfish’s — presumably the jellyfish’s are just a tiny subset of ours. Maybe this is true. But maybe not.

We’re going to refer to the variety of numerical quantifications of intelligence we’ll come up with as S-numbers. This is because the idea of coming up with a series of numbers to quantify the intelligence of the machines we’re building comes from Suzanne. This particular one we’ll call $S_0$.

KL-divergence and $S_0$ in the Grumpy Universe

The first EU we will show to Cid will be the Grumpy Universe. Recall that this Universe can be thought of as comprising a single bit $x$, where we as omnipotent gods get to arbitrarily set the probability of seeing the states of that bit. Let’s say that we choose the probability of the bit being zero (Cid opens his eyes and sees Grumpy Cat) to be $P_E(x = 0) = 0.135$. This of course fixes the only other possibility (the bit is one — Cid opens his eyes and sees Creepy Manbaby) to be $P_E(x=1) = 1- P_E(x=0) = 0.865$.

Once we have fixed these, we can write out the KL-divergence explicitly as

$D_{KL}(P_E || P_B) = \sum_m \log({{P_E(m)}\over{P_B(m)}}) P_E(m) = \log({{P_E(x=0)}\over{P_B(x=0)}}) P_E(x=0) + \log({{P_E(x=1)}\over{P_B(x=1)}}) P_E(x=1) =0.135 \log({{0.135}\over{P_B(x=0)}})+0.865 \log({{0.865}\over{1 - P_B(x=0)}})$

and the $S_0$ number is the reciprocal of this

$S_0 = {{1}\over{D_{KL}(P_E || P_B)}}$

The $S_0$ number diverges when the entity’s model of the EU is perfect. We’re going to call this state Enlightenment. The state of Enlightenment is always defined relative to a specific EU. Our objective will be to allow Cid to become Enlightened, in a series of increasingly complex EUs.

The entirety of Cid’s intelligence comes down to a single number — the probability of his internal model generating a zero when Cid is dreaming. Let’s see what the KL-divergence function looks like.

You can see that it goes to zero around the ‘correct’ value of 0.135, and is convex.

Cid’s Boltzmann Machine Brain

Recall that we chose a specific architecture for Cid’s brain, which consisted of eight Hidden Units and one Visible Unit. Here it is.

Here some of the weights are explicitly shown — all four U weights (connecting the visible unit to the hidden units) and three of the W weights are explicitly shown (the bold lines with the W next to them).

Cid starts by not having any way to know what any of the free parameters in this model should be. If we were to just randomly set all of them, and allow his brain to reach thermal equilibrium at a fairly low temperature, and we were to draw samples from the resultant probability distribution, the probability of the Visible Unit being zero will just be some random number between zero and one — he’s completely disconnected from his EU. So let’s say we were to do this and then measured this probability to be, say, 0.645. Looking at the chart for KL-divergence, this gives about 0.3, the inverse of which is about 3. So the $S_0$ number — the intelligence — of this random creature would be about 3.

Of course we don’t want to build creatures that don’t interact with their environments. We want them to learn from them. We want them to become Enlightened. And thankfully the Boltzmann Machine comes with a prescription for adjusting all of its parameters to decrease the KL-divergence (and thereby increase Cid’s intelligence). By following this prescription, Cid can become smarter by looking around at his world and increasingly understanding it.

In the next post, we’ll actually do the training! If we can succeed, Cid’s $S_0$ number will diverge and he’ll have complete and utter understanding of the Grumpy Universe.

## 7 thoughts on “Defining a Measure of Intelligence for Cid”

• Thanks!! Although that specific number won’t work as things get more complex — there will be many S-numbers. But it should be fine for simple things!

1. “As a boy Kepler had been captured by a vision of cosmic splendor, a harmony of the worlds which he sought so tirelessly all his life. Harmony in this world eluded him. His three laws of planetary motion represent, we now know, a real harmony of the worlds, but to Kepler they were only incidental to his quest for a cosmic system based on the Perfect Solids, a system which, it turns out, existed only in his mind. Yet from his work, we have found that scientific laws pervade all of nature, that the same rules apply on Earth as in the skies, that we can find a resonance, a harmony, between the way we think and the way the world works.”

“When he found that his long cherished beliefs did not agree with the most precise observations, he accepted the uncomfortable facts, he preferred the hard truth to his dearest illusions. That is the heart of science.”

-Carl Sagan, Cosmos: The Harmony of the Worlds

2. A related area worth looking at is Rasch measures. In any test-question / test-taker interaction, the probability of getting the answer correct is proportional to the product of the ability of the test taker and the easiness (inverse difficulty) of the question on whatever dimension of ability/difficulty the question measures. This fact leads to the ability to create a ratio-scale measure (equal-interval scale with an absolute zero) of intelligence rather than the usual rarity measure (standard deviations). It allows norming and validating the question pool while at the same time measuring the ability of the test takers and the uncertainty of that measurement. The question difficulty and test-taker ability are measured on the same scale, a scale which is fixed by the interactions between questions and test-takers. (Up to a multiplicative constant.)

This type of scale is used in the Stanford Binet 5′s “Change Sensitive Scale”, which arbitrarily sets the average 10-year old’s score to 500, the one arbitrary choice.. An average 2.25y.o. scores 435, a 16 y.o. and up 510, and the highest score in their norming group was 592. Since this is a ratio scale, one could say the high-scorer is only about 15% greater than the average person, but also that there is a bigger difference in ability between the two than there is between a 2 year old and an average adult. Intelligence varies little among humans in an absolute sense, but far more in a relative sense than most people realize.

The question of what universe or test dimension to measure is less difficult in human intelligence tests than in comparing different species or AIs. The measures of human intelligence we have correlate with each other 0.6-0.7, and with all sorts of positive outcomes that should be affected by intelligence, usually at nearly as high a level, seldom less than 0.3.

Your KL divergence of Boltzmann brain from universe is a fascinating one, though dealing with megamolar quantities and far-from equilibrium states would seem to outweigh the conceptual simplicity in the case of natural intelligences. The math has some similarities to Rasch measures, but this is likely due to common properties of any proper (interval or ratio-scale) measurement.

3. Hi Geordie,

At this stage, I probably understand ~ 40% of the “Math section” of your post. So please forgive me if my question/comment sounds stupid.

I believe that CiD can be used to test and rank Complex Event Processing Engines; particularly those that are used in the realm of finance for Market Surveillance and Algorithmic Trading. Is my assessment correct?

A special request: Could you create web-based tools, like the Orion platform, that would allow “us civilians” to tinker with versions of the experiments / tests that you are conducting? I know that not too many people would use it, and it would be hard to justify expending resources on such a platform. However, you never know what can come out of that endeavour!