Fusing sensor modalities using an LC-DBM

Our different senses differ in detail. The features that allow effective representation of audio signals are different than those for representing vision. More generally we’d expect that any particular sensor type we’d provide to a new Creature we’re designing would need some new and specific way to effectively represent the types of data it receives from the world, and these representations would be different from sensor to sensor.

The different actuators in our Creature should also get different representations. For example, the motor signals that drive the bulk movement of the Creature (say its wheels) probably have a very different character than those that drive fine motor skills (such as the movement of fingers).

In the DBM framework, there is a natural way to handle this. The approach is called a Locally Connected Deep Boltzmann Machine, or LC-DBM. We briefly encountered this concept in an earlier post, where it made an appearance in this figure (the picture on the right).

Different kinds of Boltzmann Machines. The visible units are grey, the hidden units are white. Picture from this paper.

Different kinds of Boltzmann Machines. The visible units are grey, the hidden units are white.

Let’s see if we can build an interesting LC-DBM that we can run in hardware.

Embodying Cid in a robot

Imagine we have a robot that has two motors. One of these controls the movement of the back left wheel, and one controls the movement of the back right wheel. The robot will have a castor in front that’s free to rotate, so we get back wheel drive. We’ll assume for this first experiment that these motors only have two settings — off (0) and forward (1). This is not all that restrictive. While there are some things this type of Creature can’t do, he will be able to get to most places using just these movements.

We’ll give each of these motors visible units corresponding to two successive times. Now that we’re starting to think about embodiments, what these times are in actual physical units becomes important. In this case, we’ll set the interval between times to be much longer than the response time of the motors — say 450 ms.

The DBM we’ll start with to represent the two motors at two times will look the same as the one we used in experiment #3. Here it is.

For the motor sector, we use two visible units corresponding to whether a motor is off (0) or moving forward (1) at two successive times.

For the motor sector, we use two visible units corresponding to whether a motor is off (0) or moving forward (1) at two successive times. t and t+1 differ by 450 ms.

This Creature is also going to be equipped with a camera, so we can also have vision neurons. A typical camera that you’d mount on a robot provides a huge amount of information, but what we’re going to do is to start off by only using a tiny fraction of it, and in a particularly dumb way. What we’ll do is take the images coming in from the camera, and separate them into two regions — the left and right halves of the full image. We’ll take all of the pixels in each side, average them, and threshold them such that if the average intensity of the pixels is 128 or higher, that means 1 (i.e. bright = 1) otherwise 0 (dark = 0). This mimics the thresholded photodetector ommatidia idea we discussed a couple of posts back, although now we have two of them — one for the left side of the creature’s vision, and one for the right side.

Again we’ll have two successive times. Typical cameras provide around 30 frames per second, which is a lot faster than the time we set for the motor response. So what we’ll do is average the camera results over 15 frames, so that we can keep the difference in time the same as the difference we chose for the motors. Again this is not the smartest thing we could do but we can improve this later! With these choices, here’s the DBM we will use for the vision system.

Note that the unit labels have been shifted by 4 for each from the motor system.

Note that the unit labels have been shifted by 4 for each from the motor system.

Now let’s equip our Creature with a speaker / microphone. As with the vision system, an audio system we can mount on a robot can provide us with very rich data. But we’ll ignore most of it for the time being. Analogously to the simple system we put in place for vision, let’s again choose two audio neurons, but this time instead of thresholding the intensity of the visual input on the left/right halves of the incoming images, we’ll threshold the intensity of two different frequencies, one low and one high, corresponding to 100 Hz and 1000 Hz. An input in each will be 0 if the fourier component of the signal over a total of 450ms was less than a threshold, and 1 if it’s greater. The idea is that if these frequencies are present, the corresponding audio neuron will be on, otherwise it will be off.

Here’s the DBM for the audio system.

Cid_pics_25

Here’s the audio DBM. We’ve set it up so that there are two audio neurons, one of which fires if it senses 100Hz and the other which fires if it senses 1,000Hz.

Finally, let’s add a weapons system. We’ll mount a missile launcher on the robot. Because firing a missile is serious business, we’ll make it so that both weapons neurons have to be on simultaneously for a firing event, so 00, 01 and 10 mean ‘don’t fire’, and 11 means ‘fire’. Again we’ll have two times, separated by 450 ms. Here’s the weapons system DBM.

For the weapons system, we fire only if the most significant bit (MSB) and least significant bit (LSB) are both 1, else we don't.

For the weapons system, we fire only if the most significant bit (MSB) and least significant bit (LSB) are both 1, else we don’t.

Connecting sensors and actuators at higher levels of the LC-DBM

OK so we have built four different DBMs for audio, vision, motor and weapons. But at the moment they are completely separate. Let’s fix that!

Here is an architecture that brings all four together, by combining the different modalities higher up in the LC-DBM.

An example of how we can connect up all four modalities. The orange level connects audio/weapons, audio/vision and motor/vision; the green level connects vision/audio/weapons and motor/vision/audio; and the top blue level connects all four.

An example of how we can connect up all four modalities. The orange level connects audio/weapons, audio/vision and motor/vision; the green level connects vision/audio/weapons and motor/vision/audio; and the top blue level connects all four.

This network can be embedded in hardware. I created this embedding by staring at it for a few minutes. There are probably much better ways to do it. But this one should work. Here it is!

This embedding uses a 4 by 6 block of unit cells, and should allow for all the connections in the network.

This embedding uses a 4 by 6 block of unit cells, and should allow for all the connections in the network.

So that’s cool! Alright that’s enough for now, next time we’ll think about different experiments we can subject this New and Enhanced Cid to.

4 thoughts on “Fusing sensor modalities using an LC-DBM

  1. I wonder does the order of the DBMs in the LC-DBM matter? I mean since you have audio beside weapons it will learn audio/weapons features at the second level but not motor/weapons features. I know at the top level they’re all connected but it seems like adjacent DBMs will be more correlated since they have more levels to learn combined features.

    Also, I think at this point you should start calling Cid Skynet!

    • Hi JimmyD! Yes the order does matter. In the network above, there are fused features it won’t learn at the orange level with this architecture — there should really be six groups of four orange nodes, not three.

      Re. Skynet — the type of network we can set up in practice is probably powerful enough to emulate the intelligence of very primitive organisms, but lacks necessary characteristics to do the sorts of things necessary for full AGI. For now we’re going to focus on embodied cognition for robots that behave more like bugs than humans.

  2. Pingback: Как живется Data Mining компании: задачи и исследования « Домик Миа

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s