The 128-qubit Rainier chip: I. interconnect topology

This is the first of a series of technical posts describing aspects of the 128 qubit Rainier processor currently under development at D-Wave. These posts are meant to invite discussion by researchers so fire away if you have questions or think something I’ve posted is wrong or could be explained better.

Where I thought I’d start is by discussing how the interconnection topology, ie. which qubits are connected in hardware, works in the current design. First I’ll start by describing the problem.

An adiabatic quantum computer of the sort we are building is designed to run a class of quantum algorithms called adiabatic quantum algorithms. These algorithms can be used as components of both heuristic and complete solvers. If we focus on running only complete solvers for the moment, the objective of the class of adiabatic quantum algorithms relevant here is to return the global minimum of the function

{\cal E}(s_1, ..., s_N) = \sum_{j=1}^N h_j s_j + \sum_{i,j \in E} J_{ij} s_i s_j

where s_i=\pm 1 are binary (spin) variables, h_j and J_{ij} are real, and E is an edge set defining an allowed set of non-zero J_{ij}.

In a real processor design, the edge set E will not contain all possible edges. That is, there will be qubits i and j that are not physically connected in hardware and for all such pairs the coupling term is “hard-coded” to zero. This loss of generality reduces the range of problems that can be solved directly in hardware. Reducing the edge set can even fundamentally change the complexity of the class of problems the hardware can solve. For example, if each qubit can only be connected to a maximum of two other qubits, the problem is tractable classically.

Not only does the number of non-zero connections per qubit matter, the actual pattern of interconnections matters also. One example is that you can draw both planar and non-planar graphs with four connections per qubit, so if for some reason you can only have maximum per qubit connectivity of four it might be better to arrange the interconnect scheme to allow for non-planarity.

In the systems we build, the maximum number of connections per qubit is constrained by noise. In order to couple a qubit into a coupler, the qubit needs a certain amount of inductance. This inductance is obtained by increasing the perimeter of the qubit, which increases the noise seen by the qubit. Increasing the noise a qubit sees has several deleterious effects, all of which I will be discussing in later posts. For now let’s just say that the maximum number is 6 connections per qubit without answering the question of why or how to make it better.

Given 6 connections per qubit, what is the “ideal” layout / interconnect scheme? Answering this depends on what as a designer you are trying to optimize. Let’s say that the primary objective is to make a tile-able unit cell with a maximum of 6 connections per qubit. There are several possible ways to do this. The way we settled on is as follows:

The qubits are topologically loops of niobium. They are interrupted in a variety of places by compound Josephson junctions. Imagine drawing four loops schematically like the outlines of four parallel popsicle sticks lying north-south, and then laying down on top of this the exact same structure rotated by 90 degrees. Each “popsicle stick outline” is a single qubit. The points of intersection are where the coupling devices are placed. This unit cell looks physically like the picture on the left, which is identical to the picture on the right, which is also known as the complete bipartite graph on 4 vertices K_{44}:

So in this design there are 8 qubits per unit cell, 16 inter-cell couplers per unit cell, and 8 intra-cell couplers (4 to the right, 4 to the bottom). To make the tiling strategy explicit, here are 32-qubit

and 128-qubit interconnect schemes.

The red dots are identical devices to the blue dots, just colored differently to show that they sit between the K_{44} blocks. This last picture is the interconnect topology / edge set used in the 128-qubit Rainier processor.

So given this edge set we see that the types of problem that can be “natively” solved in this hardware design are as shown in the objective function {\cal E}(s_1, ... s_{128}) above, with edge set E restricted as shown in this last figure.

13 thoughts on “The 128-qubit Rainier chip: I. interconnect topology

  1. so when you expand on this can you go to 256 or do you have to/or better to, expand to 512?

    also on your “entanglement-related results obtained on the 28-qubit Leda processors” what time frame next we expect the results?

  2. I can see the “simple” K_{16} embedding, and a less-obvious K_{17} embedding after a bit of thinking, but what’s the max you guys have found? There’s definitely an upper bound of K_{27}, and I’m pretty certain that the max is in the range 17 to 20.

    It’s not the K_{32} I was hoping for, but big progress nonetheless. Even bigger progress if you’ve got a very fast way to optimally embed non-complete graphs into this pattern, though. That’d be cool… or maybe I’m just uncool, hehe. :)

  3. Hi Neil, my next post is going to be about embedding. For the 128-qubit chip, you can split the chip into two regions, the 6 upper right hand blocks and the 10 lower left + diagonal blocks. You can embed a K_{12} in the upper right block, and a K_{16} in the lower left+diagonal block. These two complete graphs can be connected in a limited way through the couplers the two sections share. Since the size of the fully connected graphs you can embed in this guy is small, we plan to operate based on algorithms that respect the interconnect structure of the 128-qubit chip, ie. graph embedding can be done at this level but the cost is prohibitive at this stage.

  4. JP: In principle you can continue to tile the plane with unit cells until you (a) run into fab yield limits and/or (b) run out of real estate on the processor die. You could build a 256-qubit chip by tiling two 128-qubit patterns side by side, or a 512-qubit chip by doing a 2×2 tile of the 128-qubit pattern, or a 2,048-qubit chip by doing a 4×4 tile of the 128-qubit pattern. Re. timing on the entanglement results, as soon as possible.

  5. Pingback: Rainier IV: Precision «

  6. Pingback: Rainier V: Instance Classes «

      • Hi SJGooch, could be :-) there are 16 couplers that couple the eight qubits within a unit cell to each other, and 8 couplers that couple qubits from one unit cell to its neighbours, for a total of 24 couplers per unit cell.

      • Thanks so much for taking the time to look in on this thread, Dr. Rose. You must be as busy as my hero Seymour Cray was in the ’60s!

        I thought this thread was long dead, but:

        Now that I think I am finally catching on to what you are doing, my question is this: If S:N limits the active coupler fan-out in this technology to 6, how much real advantage is likely to derive from expanding the cell tiling beyond 128 qubits? Isn’t the “qubit interconnect” already quite sparse? Are there real-world problems which map onto the topology of a larger number of qubits even more sparsely linked?

        Also, some of the guys at NCSA and I were wondering if we could get a copy of your raw 10,000 results (of which 13 were successful) from your recent triumph. It would be fun to investigate them by hand and perhaps do some statistical analysis on them. On the other hand, we can certainly understand why you might want to keep this kind of performance data company confidential.

        Thank you, again, Sir, for your valuable time and attention.

        Sherwin Gooch

  7. No problem!

    Well the sparsity of the interconnection between variables is an issue that’s hard to deal with, but there are some tricks you can use to help. Increasing the variable count does help in general.

    Here’s the problem we have: given an optimization problem with arbitrary connectivity (for example fully connected), and a solver that solves problems defined over a SPECIFIC connectivity but really, really fast, how do I use the fast but sparse solver to solve the original problem? Even harder: what if the original optimization problem isn’t quadratic or might only be specified at the “black box” level, such as the output of a simulation? An example of a method that has been developed for this sort of thing are “estimation of distribution” algorithms.

    Re. getting access to the data from the protein folding thing: we didn’t actually do this work, it was Alan Aspuru-Guzik and Alejandro Perdomo at Harvard, so it’s their data not ours! Our contribution was providing compute time. If you ask them for their data I’m sure they would give it to you. Note that the 13/10,000 thing is misleading, as the sample rate is one sample per 5 microseconds, so boosting the probability of seeing the ground state to say 99.9% with a single-shot success probability of 13/10,000 only takes about log(1-.999)/log(1-13/10000) ~ 5310 samples with total time of about 5310 * 5 microseconds = 27 milliseconds (ie not that bad).

  8. Pingback: Sparse coding on D-Wave hardware: things that don’t work | Hack The Multiverse

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s