Sparse coding on D-Wave hardware: structured dictionaries

The underlying problem we saw last time, that prevented us from using the hardware to compete with tabu on the cloud, was the mismatch of the connectivity of the problems sparse coding generates (which are fully connected) and the connectivity of the hardware.

The source of this mismatch is the quadratic term in the objective function, which for the j^{th} and m^{th} variables is proportional to \vec{d}_j \cdot \vec{d}_m. The coupling terms are proportional to the dot product of the dictionary atoms.

Here’s an idea. What if we demand that \vec{d}_j \cdot \vec{d}_m has to be zero for all pairs of variables j and m that are not connected in hardware? If we can achieve this structure in the dictionary, we get a very interesting result. Instead of being fully connected, the QUBOs with this restriction can be engineered to exactly match the underlying problem the hardware solves. If we can do this, we get closer to using the full power of the hardware.

L0-norm sparse coding with structured dictionaries

Here is the idea.

Given

  1. A set of S data objects \vec{z}_s, where each \vec{z}_s is a real valued vector with N components;
  2. An N x K real valued matrix \hat{D}, where K is the number of dictionary atoms we choose, and we define its k^{th} column to be the vector \vec{d}_k;
  3. A K x S binary valued matrix \hat{W};
  4. And a real number \lambda, which is called the regularization parameter,

Find \hat{W} and \hat{D} that minimize

G(\hat{W}, \hat{D} ; \lambda) = \sum_{s=1}^S || \vec{z}_{s} - \sum_{k=1}^{K} w_{ks} \vec{d}_k ||^2 + \lambda \sum_{s=1}^S \sum_{k=1}^{K} w_{ks}

subject to the constraints that \vec{d}_j \cdot \vec{d}_m = 0 for all pairs j,m that are not connected in the quantum chip being used.

The only difference here from what we did before is the last sentence, where we add a set of constraints on the dictionary atoms.

Solving the sparse coding problem using block coordinate descent

We’re going to use the same strategy for solving this as before, with a slight change. Here is the strategy we’ll use.

  1. First, we generate a random dictionary \hat{D}, subject to meeting the orthogonality constraints we’ve imposed on the dictionary atoms.
  2. Assuming these fixed dictionaries, we solve the optimization problem for the dictionary atoms \hat{W}. These optimization problems are now Chimera-structured QUBOs that fit exactly onto the hardware by construction.
  3. Now we fix the weights to these values, and find the optimal dictionary \hat{D}, again subject to our constraints.

We then iterate steps 2 and 3 until G converges to a minimum.

Now we’re in a different regime than before — step 2 requires the solution of a large number of chimera-structured QUBOs, not fully connected QUBOs. So that makes those problems better fits to the hardware. But now we have to do some new things to allow for both steps 1 and 3, and these initial steps have some cost.

The first of these is not too hard, and introduces a key concept we’ll use for Step 3 (which is harder). In this post I’ll go over how to do Step 1.

Step 1: Setting up an initial random dictionary that obeys our constraints

Alright so the first step we need to do is to figure out under what conditions we can achieve Step 1.

There is a very interesting result in a paper called Orthogonal Representations and Connectivity of Graphs. Here is a short explanation of the result.

Imagine you have a graph on V vertices. In that graph, each vertex is connected to a bunch of others. Call p the number corresponding to the connectivity of the least connected variable in the graph. Then this paper proves that you can define a set of real vectors in dimension V - p where non-adjacent nodes in the graph can be assigned orthogonal vectors.

So what we want to do — find a random dictionary \hat{D} such that \vec{d}_j \cdot \vec{d}_m = 0 for all k, m not connected in hardware — can be done if the length of the vectors \vec{d} is greater than V - p.

For Vesuvius, the number V is 512, and the lowest connectivity node in a Chimera graph is p = 5. So as long as the dimension of the dictionary atoms is greater than 512 – 5 = 507, we can always perform Step 1.

Here is a little more color on this very interesting result. Imagine you have to come up with two vectors \vec{g} and \vec{h} that are orthogonal (the dot product \vec{g} \cdot \vec{h} is zero). What’s the minimum dimension these vectors have to live in such that this can be done? Well imagine that they both live in one dimension — they are just numbers on a line. Then clearly you can’t do it. However if you have two dimensions, you can. Here’s an example: \vec{g} = \hat{x} and \vec{h} = \hat{y}. If you have more that two dimensions, you can also, and the choices you make in this case are not unique.

More generally, if you ask the question “how many orthogonal vectors can I draw in an V-dimensional space?”, the answer is V — one vector per dimension. So that is a key piece of the above result. If we had a graph with V vertices where NONE of the vertices were connected to any others (minimum vertex connectivity p = 0), and want to assign vectors to each vertex such that all of these vectors are orthogonal to all the others, that’s equivalent to asking “given a V-dimensional space, what’s the minimum dimension of a set of vectors such that they are all orthogonal to each other?”, and the answer is V.

Now imagine we start drawing edges between some of the vertices in the graph, and we don’t require that the vectors living on these vertices be orthogonal. Conceptually you can think of this as relaxing some constraints, and making it ‘easier’ to find the desired set of vectors — so the minimum dimension of the vectors required so that this will work is reduced as the graph gets more connected. The fascinating result here is the very simple way this works. Just find the lowest connectivity node in the graph, call its connectivity p, and then ask “given a graph on V vertices, where the minimum connectivity vertex has connectivity p, what’s the minimum dimension of a set of vectors such that non-connected vertices in the graph are all assigned orthogonal vectors?”. The answer is V - p.

Null Space

Null Space is also an ASCII-based adventure game: https://students.digipen.edu/~tbrosman/null_space_download.html .

Null Space is also an ASCII-based adventure game: https://students.digipen.edu/~tbrosman/null_space_download.html .

Now just knowing we can do it isn’t enough. But thankfully it’s not hard to think of a constructive procedure to do this. Here is one:

  1. Generate a matrix \hat{D} where all entries are random numbers between +1 and -1.
  2. Renormalize each column such that each column’s norm is one.
  3. For each column in \hat{D} from the leftmost to the rightmost in order, compute the null space of that column, and then replace that column with a random column written in the null space basis.

If you do this you will get an initial random orthonormal basis as required in our new procedure.

By the way, here is some Python code for computing a null space basis for a matrix \hat{A}. It’s easy but there isn’t a native function in numpy or scipy that does it.

import numpy
from scipy.linalg import qr

def nullspace_qr(A):

A = numpy.atleast_2d(A)
Q, R = qr(A.T)
ns = Q[:, R.shape[1]:].conj()
return ns

OK so step 1 wasn’t too bad! Now we have to deal with step 3. This is a harder problem, which I’ll tackle in the next post.

One thought on “Sparse coding on D-Wave hardware: structured dictionaries

  1. Pingback: Sparse coding on D-Wave hardware: finding an optimal structured dictionary | Hack The Multiverse

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s