Any given AQC processor has a fixed hardware graph, which provides an allowed edge set and an allowed vertex set. In part II of this series I discussed how problem graphs can be embedded into a hardware graph. Now I’d like to discuss a related issue, which is what to do if your problem graph cannot be embedded directly in hardware.
The most common way this happens is that the problem has more variables than qubits. For example, I might have an application generate the following problem: find the set of spins that globally minimize the objective function
where is the set of edges with non-zero . I might also have hardware that has many less qubits available than this. In the Cerberus chip this was only 16 qubits, and for Leda it was 28. Obviously the problem can’t embed natively in hardware. What can we do?
All of the ways we’ve tried of dealing with this problem ultimately have the same common goal: Trading one large problem for a large number of small problems. In other words, we need to generate a list of problems that will embed into the hardware, where solving all of them solves the original large problem. This problem is common in high performance computing (for example in decomposing problems to run on large clusters), and has an established literature of techniques and algorithms.
One example of how to break up a large problem into a series of smaller problems is based on graph decomposition. The idea here is to identify vertices in the large graph where if those vertices are removed, the graph “falls apart” into disjoint pieces. Here is an illustration of this for an extreme case, where the removal of one vertex (the blue one) breaks a 25-vertex graph into 4 6-vertex graphs.
Once the set of vertices is identified whose removal breaks a graph down into acceptable component parts, this set is referred to as the clamp variables. The optimization algorithm then becomes optimization over the values of the clamped variables, where the optimization within the decomposed sub-graphs is done serially in hardware. This optimization can either be complete or heuristic depending on the needs of the user. In the example shown in the picture above, the single clamped variable would be set to +1, all four sub-graphs would be solved in hardware given the bias inferred by the clamp being +1, and the energy (value of the minimum objective function achieved) recorded. Then the clamp is switched to -1, and the process repeated. So instead of one native run of the entire problem in hardware large enough to accommodate it, we have 8 serial runs in much smaller hardware with the same result. Note that the total search space in the clamped variables is exponential in the number of clamps.
Another approach which works well is to use the hardware as a component of a large neighborhood local search heuristic. In this approach, any algorithm that uses a local search subroutine can swap out the existing local search by a local search in the number of variables the hardware can accept. For example if the hardware can solve arbitrarily connected 16 variable problems, when the algorithm requires a local search, the problem is formulated as a local search over 16 variables and sent to the hardware. Another way of looking at this is as a heuristic where (1,000-16) variables are clamped and 16 are free to find their optima subject to the static biases of the clamps, and this is repeated over and over again with some clever way to select which group of 16 at every step are most likely to reduce the energy of the optimization objective.
These two broad themes (graph decomposition and large neighborhood local search) have many variants. We have written up all of our ideas in a document which I will link to here when it’s finished.
Hopefully this brief overview (or barring that, the publication referenced above) is enough to convince you that you can use even tiny AQC processors as components of solvers, even for very large scale problems. Once that is settled, the next important question becomes this: is it worth doing? Just because you can do it doesn’t mean it is a good idea. So let’s turn to this question.
A necessary (but not sufficient) property of the AQC solver is that the problems it is natively solving in hardware need to be solved faster (for a given level of acceptable accuracy in the result) than the best available classical competition. In other words, if the small problems generated by the techniques above could be solved faster by a competing method, the AQC cannot be made useful as a computing device (note that it still could be very useful as a scientific instrument but let’s put that aside for now). So while it certainly isn’t sufficient for the approaches for large-scale problems above to be good ideas, it is absolutely clear that if the AQC can’t beat classical approaches for the problems it is natively solving, integrating it into a solver isn’t going to be a good idea as you could just swap it out for a classical solver and do better.
So a key question to answer is at what point (if at all) are superconducting AQCs expected to outperform their classical competitors? The next post will attempt to address this question.