We decided to switch our training and test datasets to one we custom built. This new dataset, which is going to be made publicly available, is formed of seventy 2592 x 1944 pixel color photographs supplied by Suzanne. The training set has sixty images and the test set has ten. Here is training image #57, which is from the top of the Glacier Express on Blackcomb.

For the tests I’m going to describe, each image is segmented into 1,024 non-overlapping patches of size 20 x 15 pixels. For the training set, this means a total of 60 * 1,024 = 61,440 patches.

The sparse coding procedure is then applied to these patches, and a 512 atom dictionary is learned that allows reconstructions of all of the image patches.

.

**A trial run**

The first thing I did is to run through the entire sparse coding procedure, using the L1-norm version using and keeping SVD modes. Note that the original image patches are vectors of length 20 x 15 x 3 (RGB) = 900; keeping the first 507 of these gives pretty good reproductions. The value of used is small enough to not regularize strongly and therefore the reconstructions in this run aren’t going to be sparse.

Note that the reason I chose 512 dictionary atoms and was in anticipation of comparing these results directly to the L0-norm structured dictionaries run in hardware, where for reasons I outlined in earlier posts these are sensible choices (number of dictionary atoms should equal number of qubits and should be bigger than for overcompleteness, but has to be at least of length number of atoms minus the connectivity of the least connected qubit (in the case of a Vesuvius chip, this is 5) in order for the structured dictionaries idea to work.

Here are some run results.

- The average sparsity is: 187.585 atoms used per reconstruction
- The reconstruction error on the training set is: 13.527
- The average reconstruction error per image is: 0.000220
- The wallclock time is:143380.135409 seconds running on 200 cores
- Lowest objective function value: 87.9729119482.

The average number of atoms used per image to autoencode the training set was 187.6 / 512 , so we were right about this value of begin too small to make reconstructions sparse. But it’s probably big enough to start seeing some of the features we want.

**The dictionary learned**

Here are the dictionary atoms learned for these parameters. In this picture the atoms have no particular order.

These look pretty good. Note that there is a mix of random-looking atoms and ones with structure that look like edges. We see this type of result for atoms learned in when the reconstructions are in the region between no regularization at all and the sparse regime. I think what’s happening is that the random looking atoms have the freedom to ‘memorize’ the training set, reducing the reconstruction error, but not really learning anything generalizable about images. The edge-type ones are what we are really after, and we start to see some showing up here, but we need a larger value of to force most or all of the atoms to be like that.

**A second trial with larger **

Next I re-ran the procedure with on 300 cores.

Here are some run results.

- The average sparsity is: 27.5803548177 atoms used per reconstruction
- The reconstruction error on the training set is: 127.292824922
- The average reconstruction error per image is: 0.0020718233223
- The wallclock time is:15340 seconds
- Lowest objective function value: 492.77728.

**The dictionary learned**

This looks great. This is about where we want to be. As in prior runs, somewhere in the range of 20-50 atoms per reconstruction seems to be a good place to be for L1 for the parameters we’re using.

So we’re good, infrastructure seems to be working (mostly). Now I’m going to do a sweep through and plot reconstruction error on both training and test sets as a function of average number of atoms used, for L1, L0 and structured dictionaries.

This is really interesting! This looks even better than the neural net-based sparse features that people have studied over the years. The full set of gabor-like basis functions seem to have been learned.

Hi Alireza! Yeah it does look pretty good. Note that the specific parameter choices used for this run correspond to an implementation of Honglak Lee / Andrew Ng’s “efficient sparse coding algorithms” http://ai.stanford.edu/~hllee/nips06-sparsecoding.pdf algorithms, but recoded from scratch in python and designed to run in a massively parallel (cloud) environment. The big experiment being run now compares this L1-norm form of sparse coding with both unconstrained L0-norm and structured dictionaries L0-norm sparse coding. The main figure I want to plot is reconstruction error vs. average number of atoms used per reconstruction when the latter is small (the sparse limit). What I want to know is to what extent the results we saw on MNIST (L0 requires about 1/2 the atoms to get the same reconstruction error) holds for natural images, and what effect the structured dictionaries restriction has on this same figure of merit.

Any more results? I’m holding my breath here!