Learning to program the D-Wave One: wisdom of the crowd, or strong classifiers

A strong classifier is a function — let’s call it {\cal F}(x) — that is formed by summing up the guesses of some subset of the full dictionary of weak classifiers, and performing a ‘majority vote’ on these guesses. The strong classifier we will use here looks like {\cal F}(x) = sign\left[ \sum_{t=1}^T \bar{F}_t(x)\right], where the strong classifier is formed by using a total of T weak classifiers \bar{F}_t(x) . While the notation may look a little confusing, the idea here is really simple – the set \{\bar{F}_t(x), t=1..T\}  is the subset of the entire dictionary \{F_k(x), k=1..D\} that, when voting together, produces the best strong classifier.

We also have to include a rule for breaking ties – the label we assign if the sum in the sign function is zero (in our example gender classifier, we will assign ‘female’ (+1) in this case, as it’s more likely that any generic name will be female than male because there are many more female names than male names).

Because the strong classifier concept is really the single most important idea for what follows here, it’s worth going over this idea in some detail.

What we are looking for is the subset of our weak classifiers that, when acting together, provide the best classifier. To visualize what this means, let’s think about the simple dictionary of weak classifiers we built in Algorithm 2.

In this dictionary there are 52 weak classifiers. We could form our strong classifier to consider including everything from zero of these (in this case because of our tie breaking rule this strong classifier would label everything ‘female’) to all 52 of them acting together – see the figure below for one possibility. The total number of possible combinations is 252, which is a large number.

Each of these combinations has some performance measure (we will see shortly what this is), and we want the combination with the best performance. Finding the best performing group of weak classifiers is an optimization problem. This problem is easy to write down, but hard to solve.

This is a very important concept at the heart of the quantum binary classification (QBC) approach.

Determining which weak classifiers to include in your final strong classifier is the core optimization problem we are going to solve in hardware. This is the key idea behind the QBC algorithm.

 

Figure. Each possible combination of our weak classifiers gives a possible strong classifier. In this figure there are 52 weak classifiers in our dictionary, which are shown as dots numbered from 1 to 52. One possible candidate for a strong classifier, formed by keeping weak classifiers 2, 4, 14, 33, 39 and 45, is shown by highlighting these in red and ‘promoting’ them to the list at the top of the Figure. There are 252 total possible strong classifiers, and the objective of the QBC algorithm is to find the one that performs best.

 

By the way, in case it isn’t clear, here is what I mean by majority voting. Let’s say we have a name we want to label with a gender (for example ‘Ludwig’), and our potential strong classifier consists of three weak classifiers, for example “label male if the last letter is g else label female”, “label female if the last letter is a else label male”, and “label male if the last letter is n else label female”. Applying these in order gives ‘male’, ‘male’ and ‘female’, so a majority vote of these three assigns the label ‘male’. Our convention here is to associate the ‘male’ label with -1 and the ‘female’ label with +1, so equivalently we can do the majority voting by counting up these contributions, and if the resulting sum is negative label male else label female (in the above example this would be (-1) + (-1) + (+1) = -1 which is negative therefore label ‘male’).

3 thoughts on “Learning to program the D-Wave One: wisdom of the crowd, or strong classifiers

  1. This looks like a good way of choosing a number of SNPs that are weak classifiers of disease, into a smaller panel with greater predictive value. Does the quantum computer work through the multiple iterations faster than a classical computer or does is it better at optimizing the final result?

  2. the ° →which I understand, the maths is a epidemiological paradigm →∑ some experience to a conclusion or ƒ(x) which may be changed by other values? of weak choices? In some way I understand parallel worlds are being simulated, and shared in some weird way in this computational model, yet ∩ are a greater part of the model? I do and I dont understand. But in my limited way I think I get your point, tell me How I am wrong.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s