I spend most of my time thinking about machine intelligence. I would like to build machines that can think and act like we do. There are many hard problems to solve before we get there.
A thing I’ve been thinking about recently is how the limits of cognition matter in understanding cognition itself. Clearly you can have examples where the machinery is not sophisticated enough. For example a C. elegans roundworm is not going to reverse engineer its own neural system any time soon, even though it is very simple (although the descendents of openworm might…).
Openworm is a really interesting idea. I hope it does well and succeeds. Although the idea of the first life forms on the internet being worms, that OBVIOUSLY will grow super intelligent and take over the universe and consume all its atoms making bigger and bigger Harlem Shake videos, is a little off-putting from a human perspective. De Vermis Mysteriis…
We are so smart, s-m-r-t
Imagine the most intelligent entity possible. Would that thing be able to understand its own cognition? As you crank up a cognitive system’s ability to model its environment, presumably the cognition system itself gets more difficult to understand.
Is the human cognitive system both smart enough and simple enough to self reverse engineer? It’s probably in the right zone. We seem to be smart enough to understand enough of the issues to take a good run at the problem, because our cognition system is simple enough to not be beyond the ability of our cognition system itself to understand. How’s that for some Hofstadter style recursion.
Anyway enough with the Deep Thoughts. Let’s do some math! Math is fun. Not as fun as universe eating worms. But solving this problem well is important. At least to me and my unborn future vermiform army. Maybe you can help solve it.
A short review of L0-norm sparse coding with structured dictionaries
Last time we discussed sparse coding on the hardware, I introduced an idea for getting around a problem in using D-Wave style processor architectures effectively – the mismatch between the connectivity of the problem we want to solve and the connectivity of the hardware.
Let’s begin by first reviewing the idea. If you’d like a more in-depth overview, here is the original post I wrote about it. Here is the condensed version.
- A set of data objects , where each is a real valued vector with components;
- An real valued matrix , where is the number of dictionary atoms we choose, and we define its column to be the vector ;
- A binary valued matrix , whose matrix elements are ;
- And a real number , which is called the regularization parameter,
Find and that minimize
subject to the constraints that for all pairs that are not connected in the quantum chip being used.
To solve this problem, we use block coordinate descent, which works like this:
- First, we generate a random dictionary , subject to meeting the orthogonality constraints we’ve imposed on the dictionary atoms.
- Assuming these fixed dictionaries, we solve the optimization problem for the dictionary atoms . These optimization problems are now Chimera-structured QUBOs that fit exactly onto the hardware by construction.
- Now we fix the weights to these values, and find the optimal dictionary , again subject to our constraints.
We then iterate steps 2 and 3 until converges to a minimum, keeping in mind that this problem is jointly non-convex and the minimum you get will be a local minimum. Each restart of the whole algorithm from a new standing point will lead to a different local minimum, so a better answer can be had by running this procedure several times.
Step 3: Finding an optimal structured dictionary given fixed weights
The hard problem is Step 3 above. Here the weights are fixed, and we want to find an optimal structured dictionary. Here is the formal statement of the problem.
- An real valued matrix , where is the number of data objects, and we define the column to be the data object , where each is a real valued vector with components, and the matrix elements of are ;
- An real valued matrix , where is the number of dictionary atoms we choose, and we define its column to be the vector , and the matrix elements of are ;
- And a binary valued matrix with matrix elements ;
Find that minimizes
where , subject to the constraints that for all pairs that are not connected in the quantum chip being used.
What makes this problem hard is that the constraints on the dictionary atoms are non-linear, and there are a lot of them (one for each pair of variables not connected in hardware).
Ideas for attacking this problem
I’m not sure what the best approach is for trying to solve this problem. Here are some observations:
- We want to be operating in the regime where is sparse. In this limit most of the will be zero. Because the coupling term is quadratic in ‘s matrix elements, for all L0-norm sparse coding problems most of the coupling terms are going to be zero. This suggests a possible strategy where we could first solve for assuming that the quadratic term was zero, and then whatever we do next could use this as an initial starting point.
- There are some types of matrix operations that would not mess up the structure of the dictionary but would allow parametrization of changes within the allowed space. If we could then optimize over those parameters we could take care of the constraints without having to do any work to enforce them.
- There is a local search heuristic where you can optimize each dictionary atom moving from to in order while keeping the other columns fixed, and just iterating until convergence (you have to do some rearranging to ensure the orthogonality is maintained throughout using the null space idea in the previous post). This will probably by itself not be a great strategy and will probably get you stuck in local optima but maybe it would work OK.
What do you think? I might be able to get you time on a machine if you can come up with an interesting way to solve this problem effectively…