First ever DBM trained using a quantum computer

In Terminator 2, Arnold reveals that his CPU is a neural net processor, a learning computer. Of course it is! What else would it be? Interestingly, there are real neural net processors in the world. D-Wave makes the only superconducting version, but there are other types out there also. Today we’ll use one of our superconducting neural nets to re-run the three experiments we did last time.

I believe this is the first time quantum hardware has been used to train a DBM, although there have been some theoretical investigations.

Embedding into hardware

Recall that the network we were training in the previous post had one visible layer with up to four units, and two hidden layers each with four units. For what follows we’re going to associate each of these units with a specific qubit in a Vesuvius processor. The way we’re going to do this is to use a total of 16 qubits in two unit cells to represent the 12 units in the DBM.

All D-Wave processors can be thought of as hardware neural nets, where the qubits are the neurons and the physical couplers between pairs of qubits are edges between qubits. Specifically you should think of them as a type of Deep Boltzmann Machine (DBM), where specifying the biases and weights in a DBM is exactly like specifying the biases and coupling strengths in a D-Wave processor. As in a DBM, what you get out are samples from a probability distribution, which are the (binary) states of the DBM’s units (both visible and hidden).

In the Vesuvius design, there is an 8×8 tile of eight-qubit unit cells, for a total of 512 ‘neurons’. Each neuron is connected to at most 6 other neurons in Vesuvius. To do the experiments we want to do, we only need two of the 64 unit cells. For the experts out there, we could use the rest to do some interesting tricks to use more of the chip, such as gauge transformations and simple classical parallelism, but for now we’ll just stick to the most basic implementation.

Here is a presentation containing some information about Vesuvius and its design. Take a look at slides 11-17 to get a high level overview of what’s going on.

Here is a picture of the DBM we set up in the last post.

Here we still have two neurons -- one vision and one motor -- but we have two different times (here labeled t and t+1).

Here we still have two neurons — one vision and one motor — but we have two different times (here labeled t and t+1).

Here is the embedding into hardware we’ll use. Hopefully this is clear! Each of the blue lines is a qubit. The horizontal qubits in unit cell #1 are strongly coupled to the horizontal qubits in unit cell #2 (represented by the red circles). We do this so that the variables in the first hidden layer can talk to all four variables in the second hidden layer (these are the four vertical qubits in unit cell #1) and all four visible units (these are the vertical qubits in unit cell #2).

The embedding into hardware we'll use here. We use two units cells from the top left hand corner of the chip. The red circles indicate strong ferromagnetic coupling between the horizontal qubits in the two unit cells, which represent the four variables in the first hidden layer. The leftmost four vertical qubits represent the variables in teh second hidden layer, while the rightmost four qubits represent the visible units.

The embedding into hardware we’ll use here. We use two units cells from the top left hand corner of the chip. The red circles indicate strong ferromagnetic coupling between the horizontal qubits in the two unit cells, which represent the four variables in the first hidden layer. The leftmost four vertical qubits represent the variables in the second hidden layer, while the rightmost four qubits represent the visible units.

Using the chip to replace the alternating Gibbs sampling step

Recall that the algorithm we used for training the DBM required drawing samples from two different distributions — the ‘freely running’ network, and a network with inputs clamped to values set by the data we are learning over. So now we have a hardware neural net. Can we do these two things directly?

The way the chip works is that we first program in a set of biases and weights, and then draw a bunch of samples from the probability distribution they create. So we should be able to do this by following a very simple prescription — do everything exactly the same as before, except replace the alternating Gibbs sampler with samples drawn from the hardware with its machine language parameters set to the current bias, offset and weight values.

The only tricky part of this (and it’s not really all that tricky) is to create the map between the biases, weights and offsets in the software model to the biases and weights in the hardware.

Experimental results: Running a real quantum brain

Here are the results of doing this for the three experiments we set up last time, but now comparing training the DBM using alternating Gibbs sampling in software to training the DBM by drawing samples from a Vesuvius 6 chip. The parameters of the run were 100 problems per minibatch, 100 epochs, 1000 learning steps per epoch, learning rate = 0.005 and reparametrization rate = 0 (I set it to zero just to make everything simpler for debugging — we could make it non-zero if we want).

Comparing Alternating Gibbs Sampling in software (blue) to drawing samples from Vesuvius (red). Both do great!

Comparing Alternating Gibbs Sampling in software (blue) to drawing samples from Vesuvius (red). Both do great!

Same comparison, but for Experiment #2. Here we see something very interesting -- the quantum version learns faster and gets a lot smarter!

Same comparison, but for Experiment #2. Here we see something very interesting — the quantum version learns faster and gets a lot smarter!

Same but for experiment #3. Again the quantum version learns faster and gets smarter.

Same but for experiment #3. Again the quantum version learns faster and gets smarter.

This is just so freaking cool.

A recap

So for the first time ever, a quantum computer has been used to train a DBM. We did this for three different experiments, and plotted the S_0 number as a function of epoch for 100 epochs. We compared the results of the DBM training on a Vesuvius chip to the same results using the standard alternating Gibbs sampling approach, and found that for experiments 2 and 3 the quantum version trained faster and obtained better scores.

This better performance is due to the replacement of the approximate AGS step with the correct sampling from the full probability distribution obtained from using Vesuvius.

6 thoughts on “First ever DBM trained using a quantum computer

  1. I get that with gibbs sampling you need the distributions to “factor” but is there anything stopping you from putting some intra-layer connections between neurons here?

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s