New results coming from the AQUA@home project. I believe that this is the largest scale simulation of a quantum algorithm ever (anyone know of a bigger one?), consuming on the order of 10^{20} floating point operations of vectorized, optimized, multi-threaded goodness over a total time of about 2 years.

In addition, I believe this is the first time ever that a prediction for a computation time in seconds has been computed for any quantum algorithm. In order to do this we used a Hamiltonian for a specific circuit (a 128 qubit Rainier chip) whose parameters are all independently extracted.

Here’s the abstract:

We calculate median adiabatic times (in seconds) of a specific superconducting adiabatic quantum processor for an NP-hard Ising spin glass instance class with up to N = 128 binary variables. To do so, we ran high performance Quantum Monte Carlo simulations on a large-scale Internet-based computing platform. We compare the median adiabatic times with the median running times of two classical solvers and find that, for problems with up to 128 variables, the adiabatic times for the simulated processor architecture are about 4 and 6 orders of magnitude shorter than the two classical solvers’ times. This performance difference shows that, even in the potential absence of a scaling advantage, adiabatic quantum optimization may outperform classical solvers.

A link to the paper is here.

### Like this:

Like Loading...

*Related*

That’s at zero temp?

Hi Dave: As usual you see the Big Problem right away :)

Yes. I don’t know how to do an open quantum systems version of this. Basically the QMC calculates the energy gap and the matrix element between ground & first excited states at a bunch of values of s between s=0 and s=1.

Just to be overly precise, the simulation’s temperature was about 0.75mK, but the gap sizes, ground state curvatures, etc, computed from that simulation were used to compute an upper bound of the adiabatic time, which happens to approximate the time for a zero-temperature processor to get 63.2% occupation of the unique final ground state.

It would actually have been much easier to simulate at higher temperature, similar to how an optimal temperature for a processor running the small gap instances might be, if I had to make a wild guess, on the order of 2mK to 10mK. We just wouldn’t have gotten any useful data if we’d simulated that hot. :)

As a loose analogy, the amount of time it takes optimally-spaced parallel tempering replicas to get through the min gap without significantly occupying the first excited state goes up roughly as some monotone function of the adiabatic time. However, if it can occupy a low first excited state too, it’s much faster to get through to the global minimum and back. Unfortunately, it’s not a perfect analogy, so I don’t know if the open system runtime can be estimated from it.

1e20 FLOPS? I’m confused. I thought the record for energy efficiency is about 3e-9 Watts/FLOPS, so that would mean you are using 3e11 Watts. A medium sized nuclear power plant produces about 1e9 Watts

rrtucci: ah yes, more clarity needed. What I meant is FLOating Point operationS. Ie total number of floating point operations, over a total period of about 2 years I believe. I got used to writing FLOP/S for floating point operations per second. I think (Neil or Kamran can you confirm) that AQUA usually runs at about 10 teraflop/s.

10^20 is the approximate number of operations (floating point and integer) performed by about 8000 cores over a course of 3 months. We probably underestimated this number. Note that AQUA has been running for the past couple of years (we were constantly improving the accuracy of the simulation results) but we used the results of the last set of simulations for the paper, i.e. the last 3 months. So the total number of operations performed by AQUA over 2 years is way more than 10^20.

Neil: Hmm interesting, yes possibly simply using the temperature from the QMC simulation might be enough, at least in some limit, to mimic an open quantum systems simulation. I guess what you’d need then is to have the central systems’ dynamics change slowly compared to the timescale for equilibration with a bath, so that you could reasonably believe the central system has time to equilibrate to the bath temperature (and the bath is big enough to not change itself during interaction with the central system). Someone has probably thought through this properly, anybody know any references along these lines?

Geordie, a pedantic correction:

if you look in Wikipedia, the standard is to use “FLOPS” or “Flops/s” for floating-point operations per sec and “Flops” for just floating point operations.

I fixed the text to make it more clear!

AQUA, Rainier… Congratulations! The point is… is this the beginning, or is this the end? Is there AQC beyond 128 qubits? Does all this end up to a factor, 10^4? Figure 3: Will be temperature an obstacle for scaling up? Figure 4: A “speedup” of 10^4, but uh, it remains similar for the entire 8-128 interval… Of course, this factor of 10^4 worths, but if it is constant regardless of the problem size, it makes no difference with a classical computer, no matter the physics it’s based on. I think “potential absence of a scaling advantage” deserves bold, capital letters.

Xuru: we explicitly stayed away from any speculation about the performance of bigger chips. This paper is just about a specific chip with 128 qubits. There is very little guidance from theory / numerics for anything bigger. Note that if you can get a 10^4 speedup in real life for a specific chip over the latest cots chips that’s quite a big deal. In real time that could reduce 24 hours of run time to 10 seconds. The reduction in energy consumption per calculation is potentially important also. The systems we build consume about 15 kilowatts. So 10 seconds consumes 150 kilojoules for us. The competing system consumes about 300 watts, which over 24 hours costs 26 megajoules, so you might start to see orders of magnitude lower energy consumption as well as simply performance advantages.

Thanks Geordie. I agree that if you finally get something like a 10^4 speedup with your hardware at DWave it will be superb, I see the advantages. I understand extrapolating data is a sort of speculation. But I think the question “does data suggest the absence of a scaling advantage?” arises when seeing figure 4 and it pays some attention. Is the answer to this question: “absolutely not, not only it cannot confirm nothing, but it cannot even suggest nothing because it’s about a specific chip with 128 qubits”? Meaning that “to have some clues about this, to make some guess, other problem sizes with other chip sizes should be simulated, for instance 8 variables with a specific 8 qubits chip, and so on”? I also would like to ask if you consider possible scaling up from these 128 qubits of the Rainier chip and if cooling requirements or something ultimately limits it.

Hi Xuru, I think “absolutely not, not only it cannot confirm nothing, but it cannot even suggest nothing because it’s about a specific chip with 128 qubits” is about right. The result here is about a specific set of parameters in the adiabatic algorithm (there are lots of variations you can try, like in simulated annealing) on a specific chip.

Also “to have some clues about this, to make some guess, other problem sizes with other chip sizes should be simulated, for instance 8 variables with a specific 8 qubits chip, and so on”? is correct, except I think the better way to approach this is to do it experimentally and not numerically — I think it’s better to use real chips if you have the option.

Scaling up from 128 is straightforward. We think that the approaches we use now can scale without any changes to about 10,000 qubits. There are issues in going beyond this number, but I think even these can be overcome.

Thanks again. And, wow! that’s fantastic!! “better way to approach this is to do it experimentally…” all right, but please, meanwhile let us in AQUA do some of it numerically. Regards.