# Classifying the Iris data set

The Iris data set from the UC Irvine machine learning repository is

… perhaps the best known database to be found in the pattern recognition literature. Fisher’s paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Here are the three different types of Iris (setosa, versicolor, virginica respectively):

I ran GloboBoost using the following prescription:

1. First we try to classify using only the first two types of Iris, setosa and versicolor
2. Randomly choose 25 instances of Iris-setosa and 25 instances of Iris-versicolor to be the training set; the other 25 from each are the testing set
3. Run GloboBoost on these, recording  the performance of the trained classifier on the testing set
4. Do this 1000 times and average the results

The results:

Average classification performance of GloboBoost: 100%

Doing the same with the other data sets:

Average classification performance of GloboBoost, Iris_1 vs. Iris_3: 100%

Average classification performance of GloboBoost, Iris_2 vs. Iris_3: 87.4%

Here is the matlab code I used to do this. There are two functions: prep_Iris and crunch_Iris. Note that I had to save them as .doc files as wordpress won’t let me upload .m files, so if you want to use them just change the extension to .m. You will have to do a little pre-processing of the raw Iris data (what I did: strip the text describing the Iris category, use Import Data on the resulting 150×4 data set, save the first 50 as Iris_1, the second fifty as Iris_2 and the third 50 as Iris_3). It turns out that it depends which data file you load as positives for a reason that I’m going to track down and fix, so try reversing which data set is positive vs. negative if you’re not getting the performance above.

Converting to ISING and truncating to finite precision

Converting the optimization problem to spin variables gives problems of the form

$E(z_1, ..., z_8) = \sum_{j=1}^8 b_j z_j + \sum_{i

If we truncate the $\{ h,J \}$ to 2 bits of precision (ie $\{ h,J \}$ can only be set to $[-1, 0, +1]$ GloboBoost still does as well as with the exact values. Here is the performance of GloboBoost using ISING with 2 bits of precision:

Average classification performance, ISING 2BOP, Iris_1 vs. Iris_2: 100%

Average classification performance, ISING 2BOP, Iris_1 vs. Iris_3: 100%

Average classification performance, ISING 2BOP, Iris_2 vs. Iris_3: 90.5%

Interesting that the classifier is actually performing better with the truncation on the classifier where the data is not 100% separable. This data set is a good one to use for a real hardware test…

This entry was posted in D-Wave Science & Technology by Geordie. Bookmark the permalink.

I'm the chief technology officer of D-Wave and 2010 NAGA Brazilian jiu-jitsu light heavyweight world champion.

## 5 thoughts on “Classifying the Iris data set”

1. Hi JP, no haven’t run these on hardware yet but I am going to see if I can get some time to do one of these.

2. Isn’t a thousand runs on a data set of 50 elements kind of overkill? Especially when the result is 100% each time? I mean it’s good to exercise the software, and since compute time is relatively cheap, it is probably no big deal, but I would think a dozen runs would be sufficent to demonstrate the code is working when all the results are correct.

Now when the results are not entirely correct, then you might learn something.

3. hi

http://www.technologyreview.com/computing/23198/

from the above

Meanwhile, Hans Mooij of the Delft University of Technology, with Seth Lloyd, who directs MIT’s Center for Extreme Quantum Information Theory, has created quantum states (which occur when particles or systems of particles are superpositioned) on scales far above the quantum level by constructing a superconducting loop, visible to the human eye, that carries a supercurrent whose electrons run simultaneously clockwise and counterclockwise, thereby serving as a quantum computing circuit.