It’s a strange thing, but adding constraints to a problem often helps solve it. I’ve seen this happen many times. I think the reason is that if there are too many options, it’s very hard to focus on a very specific objective, and without this kind of specificity it’s very hard to make progress.
Interestingly, it seems as if restricting choices makes people happier also. You should watch this TED talk, it is fascinating.
Also you should watch this to fully appreciate the cosmic horror of freedom.
Constraints! Horrible, horrible constraints!
In my last post I introduced a problem that arises when we structure the dictionary so that the QUBOs sparse coding generates have some desired structure. In particular we want to constrain the QUBOs to have the specific connectivity of some given D-Wave processor, so that we don’t have to do any contortions to map problems into hardware. In general, this idea of building algorithms subject to them having to use the hardware in its native mode is a great idea. Endorsed by space ant colonies everywhere.
Unfortunately in this case, when we do this one of the other steps in the sparse coding procedure goes from easy to hard. Here is the new, hard problem that we need to solve.
Finding an optimal structured dictionary given fixed weights
- An real valued matrix , where is the number of data objects, and we define the column to be the data object , where each is a real valued vector with components, and the matrix elements of are ;
- An real valued matrix , where is the number of dictionary atoms we choose, and we define its column to be the vector , and the matrix elements of are ;
- And a binary valued matrix with matrix elements ;
Find that minimizes
where , subject to the constraints that for all pairs that are not connected in the quantum chip being used.
Here is what I’ve implemented
The structured dictionaries idea was invented by Bill Macready, who runs software & algorithms at D-Wave. When he originally implemented this idea (back in 2009), the strategy he used to solve the dictionary optimization problem had four components, which he ran serially. The four components were:
- Use the Cayley transform;
- find optimal scaling of the dictionary columns;
- do a Householder reflection;
- and optimize each column in series, fixing the values of all the others during the optimization.
Let’s take a closer look at each of these procedures.
The Cayley transform
The Cayley transform is a very interesting technique that allows optimization of the elements of a matrix (it finds local minima which isn’t great, but it’s a hard problem) while preserving orthogonality constraints between columns of the matrix — just what we need!
In response to my last post, when I asked people for ideas about how to solve this problem, Slartibartfastibast referred me to a paper that is posted here. Everything you ever wanted to know about the Cayley transform and how it applies to our case can be found there. This resource was really helpful. Thanks Slart. While my implementation was based on Bill’s earlier work, the core ideas were the same as these and helped me quite a bit.
Finding the optimal scaling of the dictionary atoms
Given an input dictionary, this operation asks for the optimal norm of each column of the dictionary — basically we multiply each column of the input dictionary by a real variable, and then optimize over all those variables. It turns out this simple operation really helps quite a bit.
The Householder transform
In the comments section of my last post, Alireza Nejati wrote:
Basically, if you take the D matrix and premultiply it by a regular square orthonormal matrix (of size N), you get a matrix that still satisfies the constraints on D (i.e. orthogonality on columns that are not connected in the quantum machine). You can check this. Let d_i and d_j be two columns of D. Let their dot product d_i^T d_j be equal to x. Then since Q is orthonormal, it preserves dot products, and so if x is 0, the dot product of Qd_i and Qd_j is going to be zero, and if their dot product is nonzero, their dot product when premultiplied by Q is going to be nonzero.
The point to all of this is that QD is a valid dictionary matrix and, further, any valid dictionary matrix can be generated in this way. So this is the parameterization you are looking for in strategy 2. So now the problem becomes minimizing ||Z-QY||^2 where Y=DW is given, and we are now simply optimizing over Q, the set of all orthonormal matrices, which have further nice parameterizations as rotations or reflections in N-dimensional space.
This is all true! Thanks Alireza! and was one of the operations Bill had in his original implementation. The operation Bill used finds the optimal Householder transformation. Interestingly what I find is that this is not very helpful in the optimization — when I do it, it rarely reduces the objective function by much. There are other parametrizations of these types of transform. One of these was suggested by Alireja in the comments section. I’m not sure that suggestion will work well, but it is probably worth trying!
Single column optimizations
If you fix all columns in the dictionary, and allow all the elements in a single column to be variables, you can optimize these by computing the nullspace basis of the column and then optimizing within that space. You can do this column-wise, and reduce the value of the objective function by doing this. These are very ‘local’ adjustments, and tend to provide a small amount of benefit.
That’s all I’ve got in this right now. Next I’m going to repeat some of my early analyses including the structured dictionaries approach. Exciting!