Interesting progress on improving quantum annealing

Degeneracy, degree, and heavy tails in quantum annealing

Both simulated quantum annealing and physical quantum annealing have shown the emergence of “heavy tails” in their performance as optimizers: The total time needed to solve a set of random input instances is dominated by a small number of very hard instances. Classical simulated annealing, in contrast, does not show such heavy tails. Here we explore the origin of these heavy tails, which appear for inputs with high local degeneracy—large isoenergetic clusters of states in Hamming space. This category includes the low-precision Chimera-structured problems studied in recent benchmarking work comparing the D-Wave Two quantum annealing processor with simulated annealing. On similar inputs designed to suppress local degeneracy, performance of a quantum annealing processor on hard instances improves by orders of magnitude at the 512-qubit scale, while classical performance remains relatively unchanged. Simulations indicate that perturbative crossings are the primary factor contributing to these heavy tails, while sensitivity to Hamiltonian misspecification error plays a less significant role in this particular setting.

Six interesting findings from recent benchmarking results

Around May 15th of 2013 Google acquired a system built around a 509-qubit Vesuvius 6 (V6) chip. Since it went online, they have been running it 24/7 at 100% usage. Most of this time has been committed to benchmarking.

Some of these results have been published, and there has been some discussion of what it all means. Here I’d like to provide my own view of where I think we are, and what these results show.

Interesting finding #1: V6 is the first superconducting processor competitive with state of the art semiconducting processors.

Processors made out of superconductors have very interesting properties. The two that have historically driven interest are that they can be extremely fast, and they can operate without requiring lots of power. Interestingly they can even be run close to thermodynamical reversibility — with zero heat generation. There was a serious attempt to make superconducting processors work, at IBM from 1969 to 1983you can read a great first hand account of it here. Unfortunately the technology was not mature enough, semiconducting approaches were immensely profitable at the time, and the effort failed. Subsequently there has been much talk about doing something similar but with our new knowledge, but no-one has followed through.

It is difficult to find the amount of investment that has gone into superconducting processor R&D. As best I can count, the number is about $4B. We account for about 3% of that number; IBM about 15%; and government sponsorship of basic research, primarily in Japan, US and Europe the remainder. Depending on your perspective, this might sound like a lot, or like a very small number — for example, a single TSMC state of the art semiconductor fabrication facility costs about six times this (~$25B) to build. The total investment in semiconductor fabrication facilities and equipment since the early days of Fairchild Semi is now approaching $1T — yes, T as in Trillion. That doesn’t include any of the investment in actual processors — just the costs of building fabrication facilities.

The results that were recently published in the Ronnow et. al. paper show that V6 is competitive with what’s arguably the most highly optimized semiconductor based solution possible today, even on a problem type that in hindsight was a bad choice. A fact that has not gotten as much coverage as it probably should is that V6 beats this competitor both in wallclock time and scaling for certain problem types. That is a truly astonishing achievement. Mattias Troyer and his team achieved an incredible level of optimization with his simulated annealing code, achieving 200 spin updates per nanosecond using a GPU based approach. The ‘out of the box’ unoptimized V6 system beats this approach for some problem types, and even for problem types where it doesn’t do so well (like the ones described in the Ronnow paper) it holds its own, and even wins in some cases.

This is a remarkable historic achievement. It’s the first delivery on the promise of superconducting processors.

Interesting finding #2: V6 is the first computing system using ideas from quantum information science competitive with the best classical computing systems.

Much like in the case of superconducting processors, the field of quantum computing has promised to provide new ways of doing things that are superior to the ways things are now. And much like superconducting processors, the actual delivery on that promise has been virtually non-existent.

The results of the recent studies show that V6 is the first computing system that uses ideas from quantum information science that is competitive with the best classical algorithms known run on the fastest modern processors available.

This is also a remarkable and historic achievement. It’s the first delivery on the promise of quantum computation.

Interesting finding #3: The problem type chosen for the benchmarking was wrong.

The type of problem that the Ronnow paper looked at — random spin glasses — made a lot of sense when the project began. Unfortunately about midway through the project it was discovered that this type of problem was expected theoretically to show no difference in scaling between simulated annealing (SA) and quantum annealing (QA). This analysis showed that it was necessary to add structure to the problem instances to see a scaling difference between the two. So if an analysis of the D-Wave approach has as its objective observing a scaling difference between SA and QA, random spin glass problems are the wrong choice.

Interesting finding #4: Google seems to love their machine.

Last week Google released a blog post about their benchmarking efforts that provide an overview of how they feel about what they’ve been seeing. Here are some key points they raise in that post.

  • In an early test we dialed up random instances and pitted the machine against popular off-the-shelf solvers — Tabu Search, Akmaxsat and CPLEX. At 509 qubits, the machine is about 35,500 times (!) faster than the best of these solvers.

This is an important result. Beating a trillion dollars worth of investment with only the second generation of an entirely new computing paradigm by 35,500 times is a pretty damn awesome achievement. NOTE FOR EXPERTS: CPLEX was NOT run in these tests to global optimality. It was run in a mode where it was timed to the time it found a target solution, and not to the time it took to prove global optimality. In addition, Tabu Search is nearly always the best tool if you don’t know the structure of the QUBO problem you are solving. Beating it by this much is a Big Deal.

  • For each classical solver, there are problems for which the hardware does much better.

This is extremely cool also. Even though we are now talking about the best solvers we know how to create, our Vesuvius chip, with about 0.001% of the investment of its competitor, is holding its own.

  • A principal reason the portfolio solver is still competitive right now is actually rather mundane — the qubits in the current chip are still only sparsely connected.

This is really important to understand — making the D-Wave technology better is likely about making the problems being solved more rich by adding more couplers to the chip, which is just an engineering issue that is nearly completely decoupled from other things like the role of quantum mechanics in all of this. It is really straightforward to make this change.

  • Eyeballing this treasure trove of data, we’re now trying to identify a class of problems for which the current quantum hardware might outperform all known classical solvers.

Now this is really cool. Even for Vesuvius there might be problems for which no known classical computer can compete!

Interesting finding #5: The system has been running 24/7 with not even a second of downtime for about six months.

This is also worth pointing out, as it’s quite a complex machine with the business end at or around 10 millikelvin. This aspect of the machine isn’t as sexy as some of the other issues typically discussed, but it’s evidence that the underlying engineering of the system is really pretty awesome.

Interesting finding #6: The technology has come a long way in a short period of time.

None of the above points were true last year. The discussion is now about whether we can beat any possible computer — even though it’s really only the second generation of an entirely new computing paradigm, built on a shoestring budget.

The next few generations of chip should push us way past this threshold — this is by far the most interesting time in the 15 year history of this project.

First ever DBM trained using a quantum computer

In Terminator 2, Arnold reveals that his CPU is a neural net processor, a learning computer. Of course it is! What else would it be? Interestingly, there are real neural net processors in the world. D-Wave makes the only superconducting version, but there are other types out there also. Today we’ll use one of our superconducting neural nets to re-run the three experiments we did last time.

I believe this is the first time quantum hardware has been used to train a DBM, although there have been some theoretical investigations.

Embedding into hardware

Recall that the network we were training in the previous post had one visible layer with up to four units, and two hidden layers each with four units. For what follows we’re going to associate each of these units with a specific qubit in a Vesuvius processor. The way we’re going to do this is to use a total of 16 qubits in two unit cells to represent the 12 units in the DBM.

All D-Wave processors can be thought of as hardware neural nets, where the qubits are the neurons and the physical couplers between pairs of qubits are edges between qubits. Specifically you should think of them as a type of Deep Boltzmann Machine (DBM), where specifying the biases and weights in a DBM is exactly like specifying the biases and coupling strengths in a D-Wave processor. As in a DBM, what you get out are samples from a probability distribution, which are the (binary) states of the DBM’s units (both visible and hidden).

In the Vesuvius design, there is an 8×8 tile of eight-qubit unit cells, for a total of 512 ‘neurons’. Each neuron is connected to at most 6 other neurons in Vesuvius. To do the experiments we want to do, we only need two of the 64 unit cells. For the experts out there, we could use the rest to do some interesting tricks to use more of the chip, such as gauge transformations and simple classical parallelism, but for now we’ll just stick to the most basic implementation.

Here is a presentation containing some information about Vesuvius and its design. Take a look at slides 11-17 to get a high level overview of what’s going on.

Here is a picture of the DBM we set up in the last post.

Here we still have two neurons -- one vision and one motor -- but we have two different times (here labeled t and t+1).

Here we still have two neurons — one vision and one motor — but we have two different times (here labeled t and t+1).

Here is the embedding into hardware we’ll use. Hopefully this is clear! Each of the blue lines is a qubit. The horizontal qubits in unit cell #1 are strongly coupled to the horizontal qubits in unit cell #2 (represented by the red circles). We do this so that the variables in the first hidden layer can talk to all four variables in the second hidden layer (these are the four vertical qubits in unit cell #1) and all four visible units (these are the vertical qubits in unit cell #2).

The embedding into hardware we'll use here. We use two units cells from the top left hand corner of the chip. The red circles indicate strong ferromagnetic coupling between the horizontal qubits in the two unit cells, which represent the four variables in the first hidden layer. The leftmost four vertical qubits represent the variables in teh second hidden layer, while the rightmost four qubits represent the visible units.

The embedding into hardware we’ll use here. We use two units cells from the top left hand corner of the chip. The red circles indicate strong ferromagnetic coupling between the horizontal qubits in the two unit cells, which represent the four variables in the first hidden layer. The leftmost four vertical qubits represent the variables in the second hidden layer, while the rightmost four qubits represent the visible units.

Using the chip to replace the alternating Gibbs sampling step

Recall that the algorithm we used for training the DBM required drawing samples from two different distributions — the ‘freely running’ network, and a network with inputs clamped to values set by the data we are learning over. So now we have a hardware neural net. Can we do these two things directly?

The way the chip works is that we first program in a set of biases and weights, and then draw a bunch of samples from the probability distribution they create. So we should be able to do this by following a very simple prescription — do everything exactly the same as before, except replace the alternating Gibbs sampler with samples drawn from the hardware with its machine language parameters set to the current bias, offset and weight values.

The only tricky part of this (and it’s not really all that tricky) is to create the map between the biases, weights and offsets in the software model to the biases and weights in the hardware.

Experimental results: Running a real quantum brain

Here are the results of doing this for the three experiments we set up last time, but now comparing training the DBM using alternating Gibbs sampling in software to training the DBM by drawing samples from a Vesuvius 6 chip. The parameters of the run were 100 problems per minibatch, 100 epochs, 1000 learning steps per epoch, learning rate = 0.005 and reparametrization rate = 0 (I set it to zero just to make everything simpler for debugging — we could make it non-zero if we want).

Comparing Alternating Gibbs Sampling in software (blue) to drawing samples from Vesuvius (red). Both do great!

Comparing Alternating Gibbs Sampling in software (blue) to drawing samples from Vesuvius (red). Both do great!

Same comparison, but for Experiment #2. Here we see something very interesting -- the quantum version learns faster and gets a lot smarter!

Same comparison, but for Experiment #2. Here we see something very interesting — the quantum version learns faster and gets a lot smarter!

Same but for experiment #3. Again the quantum version learns faster and gets smarter.

Same but for experiment #3. Again the quantum version learns faster and gets smarter.

This is just so freaking cool.

A recap

So for the first time ever, a quantum computer has been used to train a DBM. We did this for three different experiments, and plotted the S_0 number as a function of epoch for 100 epochs. We compared the results of the DBM training on a Vesuvius chip to the same results using the standard alternating Gibbs sampling approach, and found that for experiments 2 and 3 the quantum version trained faster and obtained better scores.

This better performance is due to the replacement of the approximate AGS step with the correct sampling from the full probability distribution obtained from using Vesuvius.

A visit to IDC’s 50th HPC user forum

Yesterday I was part of a session at IDC’s HPC user forum. This session was interesting because it was about quantum computing. This was the first time the HPC user forum has had a session on quantum computing. I think there will be many more in the future.

Not only was there an entire session on the topic, but the keynote speaker at the event was Charlie Bennett, an IBM Fellow who is well-known to quantum information folks, as (among other things) he co-invented quantum cryptography. I’m going to do a separate post on what he was talking about as it was fascinating.

The session I was part of was led off by Isaac Chuang from MIT, who gave an overview of where he felt the field of quantum information science and technology was. There was a pretty comprehensive overview of a set of experimental results that had been obtained c. 2002 and c. 2013, showing an impressive advance in these results from being able to do about 10 gates on 1 qubit in 2002 to about 10s of gates on about 10 qubits in 2013. Unfortunately, he completely omitted any mention at all of any of our work, or the work of independent folks doing science on our machines. I will send him some copies of Nature.

I was next, and started the festivities by stating that basically I disagreed with everything Ike had said, and was going to give a very different perspective. I felt bad about being confrontational (obviously I still haven’t watched enough Hitchens, I am working on it). But he was in the room, so if he wanted to call me on something he could (he didn’t). A smart non-expert audience that hears completely conflicting stories is going to be confused and wonder what’s up. So I thought it would be in the interests of the audience to put a name on it.

Anyway, once the drama was out of the way, I gave my talk. Here are the slides.

After my talk, Hartmut Neven from Google talked about their D-Wave machine, and what they were doing with it. He described three use cases for machine learning, including finding extremely sparse classifiers, reducing the negative effects of improperly labeled items in supervised machine learning methods, and training and inference in deep learning. One very interesting thing he revealed was that the first of these was used to train blink detectors in the Google Glass product. This is the very first time that a quantum algorithm has been used to develop commercially deployed software.

This is extremely cool, and the beginning of what I see as a new use case opportunity for us. The scenario where you need an always on detector / classifier onboard a device with extreme power constraints is increasingly common. I just absolutely love the idea that the software that makes mobile devices work can be designed by quantum computers.

Next, Dave Wecker from Microsoft gave an overview of his group’s work on building software for programming, compiling and visualizing quantum circuits. The work they are doing is really top notch, and the presentation was great.

Finally, Jay Gambetta from IBM Yorkton Heights gave a talk about the IBM work on transmon qubits. I think the main point of interest for me about this work is how difficult / impossible it would be to scale any of it up. Lots of microwave lines!!!

Link to Singularity 1-1 interview

I recently had the pleasure of doing an interview with Nikola Danaylov AKA Socrates. I really like his interview series — he’s spoken to a who’s who of interesting people. You should watch them. Here is a link to his site. I especially liked his most recent interview with Kevin Warwick.

In the interview we talk about a whole bunch of topics, including the founding of D-Wave, what D-Wave is all about, the awesomeness of wrestling, superconducting processors, quantum computers, Rose’s Law, the multiverse, stick wielding primates, the Singularity, artificial intelligence, and a bunch of other stuff. Check it out!



Some new Rainier science

Here is a short break from the sparse coding mayhem. A recent paper by some interesting folks appeared today on the arxiv. They ran some experiments on the Rainier-based system at USC.

Here is some of what they found:

Our experiments have demonstrated that quantum annealing with more than one hundred qubits takes place in the D-Wave One device… the device has sufficient ground state quantum coherence to realise a quantum annealing of a transverse field Ising model.

Here is a link to the arxiv paper.

Quantum annealing with more than one hundred qubits

Quantum computing for learning: Shaping reality both figuratively and literally

I’d like to document in snippets and thought-trains a little more of the story behind how my co-workers and I are trying to apply quantum computing to the field of intelligence and learning. I honestly think that this is the most fascinating and cool job in the world. The field of Artificial Intelligence (AI) – after a period of slowdown – is now once again exploding with possibility. Big data, large-scale learning, deep networks, high performance computing, bio-inspired architectures… There have been so many advancements lately that it’s kind of hard to keep up! Similarly, the work being done on quantum information processing here at D-Wave is ushering in a new computational revolution. So being a multi-disciplinary type and somewhat masochistic, I find it exciting to explore just how far we can take the union of these two fields.

Approximately 5 years ago, while working on my PhD at University, I started to have an interest in whether or not quantum computing could be used to process information in a brain-like way. I’m trying to remember where this crazy obsession came from. I’d been interested in brains and artificial intelligence for a while, and done a bit of reading on neural nets. Probably because one of my friends was doing a PhD that involved robots and I always thought it sounded super-cool. quantum_learning_1But I think that it was thinking about Josephson junctions that really got me wondering. Josephson junctions are basically switches. But they kind of have unusual ways of switching (sometimes when you don’t want them to). And because of this, I always thought that Josephson junctions are a bit like neurons. So I started searching the literature to find ways in which researchers had used these little artificial neurons to compute. And surprisingly, I found very little. There were some papers about how networks of Josephson junctions could be used to simulate neurons, but no-one had actually built anything substantial. I wrote a bit about this in a couple of old posts (from Physics and Cake blog):

I’d read about the D-Wave architecture and I’d been following the company’s progress for some time. After reading a little about the promise of Josephson junction networks, and the pitfalls of the endeavour (mostly because making the circuits reproducible is extremely difficult), I then began wondering whether or not the D-Wave processor could be used in this way. It’s a network of qubits made from Josephson junctions after all, and they’re connected together so that they talk to each other. Yeah, kind of like neurons really. Isn’t that funny. And hey, those D-Wave types have spent 8 years getting that network of Josephson junctions to behave itself. Getting it to be programmable, addressable, robust, and scalable. Hmm, scalable…. I particularly like that last one. Brains are like, big. Lotsa connections. And also, I thought to myself (probably over tea and cake), if the neurons are qubits, doesn’t that mean you can put them in superposition and entangled states? What would that even mean? Boy, that sounds cool. Maybe they would process information differently, and maybe they could even learn faster if they could be in combinations of states at the same time and … could you build a small one and try it out?

The train of thought continued.

From quantum physics to quantum brains

That was before I joined D-Wave. Upon joining the company, I got to work applying some of my physics knowledge to helping build and test the processors themselves. However there was a little part of me that still wanted to actually find ways to use them. Not too long after I had joined the company there happened to be a competition run internally at D-Wave known as ‘Apps Day’, open to everyone in the company, where people were encouraged to try to write an app for the quantum computer. Each candidate got to give a short presentation describing their app, and there were prizes at stake.
I decided to try and write an app that would allow the quantum computer to learn how to play the board game Go. It was called QUAGGA, named after an extinct species of zebra. As with similar attempts involving the ill-fated zebra, I too might one day try to resurrect my genetically-inferior code. Of course this depends on whether or not I ever understand the rules of Go well enough to program it properly🙂 Anyway… back to Apps Day. There were several entries and I won a runner-up prize (my QUAGGA app idea was good even though I hadn’t actually finished coding it or run it on the hardware). But the experience got me excited and I wanted to find out more about how I could apply quantum processing to applications, especially those in the area of machine learning and AI.

That’s why I moved from physics into applications development.

Since then the team I joined has been looking into applying quantum technology to various areas of machine learning, in a bid to unite two fields which I have a really strong feeling are made for each other. I’ve tried to analyse where this hunch originates from. The best way to describe it is that I really want to create models of machine intelligence and creativity that are bio-inspired. quantum_learning_2 To do that I believe that you have to take inspiration from the mammalian brain, such as its highly parallel, hierarchical arrangement of substructures. And I just couldn’t help but keep thinking: D-Wave’s processors are highly parallel systems with qubits that can be in one of two states (similar to firing or not firing neurons) with connections between them that can be inhibitory or excitory. Moreover, like the brain, these systems, are INCREDIBLY energy efficient because they are designed to do parallel processing. Modern CPUs are not – hence why brain simulations and machine learning programs take so much energy and require huge computer clusters to run. I believe we need to explore many different hardware and software architectures if we want to get smarter about intelligent computing and closer to the way our own minds work. Quantum circuits are a great candidate in that hunt for cool brain-like processing in silicon.

So what on earth happened here? I’d actually found a link between my two areas of obsession interest and ended up working on some strange joint project that combined the best of both worlds. Could this be for real? I kept thinking that maybe I wanted to believe so badly I was just seeing the machine-learning messiah in a piece of quantum toast. However, even when I strive to be truly objective, I still find a lot of evidence that the results of this endeavour could be very fruitful.

Our deep and ever-increasing understanding of physics (including quantum mechanics) is allowing us to harness and shape the reality of the universe to create new types of brains. This is super-cool. However, the thing I find even cooler is that if you work hard enough at something, you may discover that several fascinating areas are related in a deeper way than you previously understood. Using this knowledge, you can shape the reality of your own life to create a new, hybrid project idea to work on; one which combines all the things you love doing.