Rainier: III. problem decomposition

Any given AQC processor has a fixed hardware graph, which provides an allowed edge set and an allowed vertex set. In part II of this series I discussed how problem graphs can be embedded into a hardware graph. Now I’d like to discuss a related issue, which is what to do if your problem graph cannot be embedded directly in hardware.

The most common way this happens is that the problem has more variables than qubits. For example, I might have an application generate the following problem: find the set of spins s_j = \pm 1 that globally minimize the objective function

{\cal E}(s_1, ... , s_{1000}) = \sum_{j=1}^{1000} h_j s_j + \sum_{i,j \in E_p} J_{ij} s_i s_j

where E_p is the set of edges with non-zero J_{ij}. I might also have hardware that has many less qubits available than this. In the Cerberus chip this was only 16 qubits, and for Leda it was 28. Obviously the problem can’t embed natively in hardware. What can we do?

All of the ways we’ve tried of dealing with this problem ultimately have the same common goal: Trading one large problem for a large number of small problems. In other words, we need to generate a list of problems that will embed into the hardware, where solving all of them solves the original large problem. This problem is common in high performance computing (for example in decomposing problems to run on large clusters), and has an established literature of techniques and algorithms.

One example of how to break up a large problem into a series of smaller problems is based on graph decomposition. The idea here is to identify vertices in the large graph where if those vertices are removed, the graph “falls apart” into disjoint pieces. Here is an illustration of this for an extreme case, where the removal of one vertex (the blue one) breaks a 25-vertex graph into 4 6-vertex graphs.

Once the set of vertices is identified whose removal breaks a graph down into acceptable component parts, this set is referred to as the clamp variables. The optimization algorithm then becomes optimization over the values of the clamped variables, where the optimization within the decomposed sub-graphs is done serially in hardware. This optimization can either be complete or heuristic depending on the needs of the user. In the example shown in the picture above, the single clamped variable would be set to +1, all four sub-graphs would be solved in hardware given the bias inferred by the clamp being +1, and the energy (value of the minimum objective function achieved) recorded. Then the clamp is switched to -1, and the process repeated. So instead of one native run of the entire problem in hardware large enough to accommodate it, we have 8 serial runs in much smaller hardware with the same result. Note that the total search space in the clamped variables is exponential in the number of clamps.

Another approach which works well is to use the hardware as a component of a large neighborhood local search heuristic. In this approach, any algorithm that uses a local search subroutine can swap out the existing local search by a local search in the number of variables the hardware can accept. For example if the hardware can solve arbitrarily connected 16 variable problems, when the algorithm requires a local search, the problem is formulated as a local search over 16 variables and sent to the hardware. Another way of looking at this is as a heuristic where (1,000-16) variables are clamped and 16 are free to find their optima subject to the static biases of the clamps, and this is repeated over and over again with some clever way to select which group of 16 at every step are most likely to reduce the energy of the optimization objective.

These two broad themes (graph decomposition and large neighborhood local search) have many variants. We have written up all of our ideas in a document which I will link to here when it’s finished.

Hopefully this brief overview (or barring that, the publication referenced above) is enough to convince you that you can use even tiny AQC processors as components of solvers, even for very large scale problems. Once that is settled, the next important question becomes this: is it worth doing? Just because you can do it doesn’t mean it is a good idea. So let’s turn to this question.

A necessary (but not sufficient) property of the AQC solver is that the problems it is natively solving in hardware need to be solved faster (for a given level of acceptable accuracy in the result) than the best available classical competition. In other words, if the small problems generated by the techniques above could be solved faster by a competing method, the AQC cannot be made useful as a computing device (note that it still could be very useful as a scientific instrument but let’s put that aside for now). So while it certainly isn’t sufficient for the approaches for large-scale problems above to be good ideas, it is absolutely clear that if the AQC can’t beat classical approaches for the problems it is natively solving, integrating it into a solver isn’t going to be a good idea as you could just swap it out for a classical solver and do better.

So a key question to answer is at what point (if at all) are superconducting AQCs expected to outperform their classical competitors? The next post will attempt to address this question.

9 thoughts on “Rainier: III. problem decomposition

  1. Thanks! I have some plans about things to talk about but if you want me to talk about some other topics let me know. I’m focusing more on the computer science issues right now but should gradually expand to cover other things.

  2. From my POV there is just enough as I can *just* about follow🙂 It gives me stuff to think about throughout the day. I don’t know much about graph theory or the CS side of things, but that means I’m now considering it a pretty serious hobby to learn more.

    Like the other day…. I was wondering (a pure thought experiment), what would happen if you could consider a hexagonal layout for your qubit cell instead of a square one. There would be 3 sets of 3 qubits, which cross over at points on a hexagonal array, and it maintains the criteria of no more than 4 crossing points per qubit, but it would mean that each qubit was coupled to 6 others, rather than 4. (I don’t know if this makes sense so far.) I put a diagram HERE to help illustrate the point.

    You could still address any of the 9 qubits or a combination of any coupled 2, but also you can entangle 3 qubits using this architecture. There are 6 points where 2 are coupled and 7 points where 3 are coupled.

    One problem might be if the ‘extra’ qubit would ‘get in the way’ – i.e. if you were just trying to address 2 of the 3 at a particular crossing point. You could try to keep the third biased well away from the energy levels of interest, but I’m not sure if the presence of the s/c loop would cause a problem, especially if you were trying to couple the top qubit layer to the bottom qubit layer through the middle one. In this paper (which I am sure you will recognise) entangled states of 4 directly coupled qubits seem to be able to be manipulated individually, although to be fair they don’t share the same loop area anywhere.

    The other problem is that you can’t make this structure with conventional process technology. There just aren’t enough layers🙂 But that’s a technicality of course.

    My question was would you be able to embed more complex problems onto such a system due to it’s seemingly higher connectivity? (You only need one more qubit but the graph seems more connected). Or maybe this architecture is just complete nonsense on some fundamental level that I missed!

    Just something I was playing around with in my head. Filling fridges gets a bit boring sometimes😉

  3. Hey Geordie. I’m a numerics guy. I’m thinking of moving into quantum simulations after I wrap up my current research, but it’d be good to know if your quantum computer is going to take care of that problem without advances in classical numerics techniques. It’d be good to know in the next… 7 or 8 months or so. I’d appreciate it =D

  4. Jordan: The machine should be pretty good at simulating itself. Although the main fun problem here (predicting the performance of quantum adiabatic algorithms on interesting problems) is a very fun and hard quantum simulation problem.

  5. Suz: Sorry about that it appears your comment got eaten by the “moderate me” filter for some reason. The quick answer to the connectivity question is that the hardness of problems you can stuff into hardware goes up dramatically the more connectivity you can provide. The actual interconnect structure also matters but not as much. Also re. process technology and more metal layers–that’s not so much a problem. Can always add more.

    Your unit cell is intriguing… very nice work… I’m going to take a closer look at it.

  6. Suz: The problem is actually not the process technology, you can in principle layout anything that you can draw on a piece of paper in two layers, you may just end up wasting lots of space.
    The real reason why square is so convenient is that the cadence layout editor can’t rotate by 45 degrees🙂
    (for good reasons that have to do with the gridding that happens when a design is transferred to glass etc.)

  7. Suz,

    I actually like your idea a lot — has been toying with “thinking more like bees than like bricklayers” in the early days. And *of course* you can lay out hexagonal structure even without 45 degree lines!😉

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s