In Terminator 2, Arnold reveals that his CPU is a neural net processor, a learning computer. Of course it is! What else would it be? Interestingly, there are real neural net processors in the world. D-Wave makes the only superconducting version, but there are other types out there also. Today we’ll use one of our superconducting neural nets to re-run the three experiments we did last time.
Embedding into hardware
Recall that the network we were training in the previous post had one visible layer with up to four units, and two hidden layers each with four units. For what follows we’re going to associate each of these units with a specific qubit in a Vesuvius processor. The way we’re going to do this is to use a total of 16 qubits in two unit cells to represent the 12 units in the DBM.
All D-Wave processors can be thought of as hardware neural nets, where the qubits are the neurons and the physical couplers between pairs of qubits are edges between qubits. Specifically you should think of them as a type of Deep Boltzmann Machine (DBM), where specifying the biases and weights in a DBM is exactly like specifying the biases and coupling strengths in a D-Wave processor. As in a DBM, what you get out are samples from a probability distribution, which are the (binary) states of the DBM’s units (both visible and hidden).
In the Vesuvius design, there is an 8×8 tile of eight-qubit unit cells, for a total of 512 ‘neurons’. Each neuron is connected to at most 6 other neurons in Vesuvius. To do the experiments we want to do, we only need two of the 64 unit cells. For the experts out there, we could use the rest to do some interesting tricks to use more of the chip, such as gauge transformations and simple classical parallelism, but for now we’ll just stick to the most basic implementation.
Here is a presentation containing some information about Vesuvius and its design. Take a look at slides 11-17 to get a high level overview of what’s going on.
Here is a picture of the DBM we set up in the last post.
Here is the embedding into hardware we’ll use. Hopefully this is clear! Each of the blue lines is a qubit. The horizontal qubits in unit cell #1 are strongly coupled to the horizontal qubits in unit cell #2 (represented by the red circles). We do this so that the variables in the first hidden layer can talk to all four variables in the second hidden layer (these are the four vertical qubits in unit cell #1) and all four visible units (these are the vertical qubits in unit cell #2).
Using the chip to replace the alternating Gibbs sampling step
Recall that the algorithm we used for training the DBM required drawing samples from two different distributions — the ‘freely running’ network, and a network with inputs clamped to values set by the data we are learning over. So now we have a hardware neural net. Can we do these two things directly?
The way the chip works is that we first program in a set of biases and weights, and then draw a bunch of samples from the probability distribution they create. So we should be able to do this by following a very simple prescription — do everything exactly the same as before, except replace the alternating Gibbs sampler with samples drawn from the hardware with its machine language parameters set to the current bias, offset and weight values.
The only tricky part of this (and it’s not really all that tricky) is to create the map between the biases, weights and offsets in the software model to the biases and weights in the hardware.
Experimental results: Running a real quantum brain
Here are the results of doing this for the three experiments we set up last time, but now comparing training the DBM using alternating Gibbs sampling in software to training the DBM by drawing samples from a Vesuvius 6 chip. The parameters of the run were 100 problems per minibatch, 100 epochs, 1000 learning steps per epoch, learning rate = 0.005 and reparametrization rate = 0 (I set it to zero just to make everything simpler for debugging — we could make it non-zero if we want).
This is just so freaking cool.
So for the first time ever, a quantum computer has been used to train a DBM. We did this for three different experiments, and plotted the number as a function of epoch for 100 epochs. We compared the results of the DBM training on a Vesuvius chip to the same results using the standard alternating Gibbs sampling approach, and found that for experiments 2 and 3 the quantum version trained faster and obtained better scores.
This better performance is due to the replacement of the approximate AGS step with the correct sampling from the full probability distribution obtained from using Vesuvius.