This post will hopefully give a very high level overview of the basic idea behind applying natural quantum computing to artificial intelligence. It is not a post describing programming or algorithms per se, but it will give some good background introduction to how quantum computers can be used for learning tasks. I apologize in advance for the length of this post, but please don’t let it put you off!

*Introduction*

When people learn about quantum computers, they are generally told that these systems will be useful for code-breaking and factoring large numbers. But this is not the only way that you can use quantum effects to gain an advantage. The processors built by D-Wave use a process called quantum annealing (a variant of natural quantum computing). This approach is very useful for solving a different set of problems, such as those found in machine learning and artificial intelligence tasks. To begin, I will have to explain a little about one way in which machines can learn.

*Artificial brains*

In order to begin our journey into quantum computing and AI, you will need to be familiar with the concept of an ‘Artificial Neural Network (ANN)’. The investigation of ANNs gained momentum in the 1960’s, when basic computer equipment began to provide the ability to model connected input-output systems. The most basic of ANN models is known as the ‘perceptron’, and it functions by making its output depend on what inputs it received. (A bit like a logic gate). Here is a diagram of a perceptron:

You can think of the perceptron as being like a little decision making process, like a manager in a company. The manager takes several opinions (inputs), which are added together, and makes a yes/no decision based on these opinions. The weights denote how important each input is. A high weight will give a strong input, that has a lot of influence on the output, whereas a low weight will only influence it slightly. Note that a large number of small positively weighted inputs may be overwhelmed by a single strong negative input.

By stringing together many such units, large networks can be built up, and their behavior with respect to many inputs investigated. It turns out that some very complex behavior is possible even with very simple networks of these building blocks, as anyone who has worked in a company with lots of management meetings may well relate to!

*How can a network of simple ‘input-output’ blocks learn?*

Imagine taking a network of perceptrons, and comparing it to our manager doing the same thing. I should remark that our manager is very experienced and always makes the correct decision. The same information that is sent to the manager is also sent to our perceptron. In the case of the manager, he already knows how much he values each input (opinion), and in fact he makes decisions based on this. But in the case of our network, we don’t know what the manager is thinking. So instead, we set our weights (how much we value each ‘input’) to be completely random at first, and see what we get. The picture below shows our network undergoing 3 ‘training sessions’:

The first column of the table shows the manager’s decisions and the network’s decisions after a first set of inputs is sent in. At first the results do not agree very well (only 25% correct compared to the manager), and so we adjust the weights slightly, and try again (second set of decisions). Note that the decisions that need to be made can be different each time resulting in different ‘correct’ answers! The second column shows that after adjusting the weights, there is an improvement. The network is now making the same decisions as the manager 50% of the time. But we can still do better! We adjust the weights a bit more, and on the third attempt the network is matching the manager’s decision 100% of the time.

The network has learned to behave in the same way as the manager by a process of gradual adjustment of weights and cross-checking against the manager’s ‘correct’ decision. This technique is known as supervised learning, and it is one way in which we can train or teach artificial intelligences facts about the world. So it really is that simple? In essence, yes. But there is a big problem lurking under the hood here that I glossed over.

*Making networks bigger*

In this example, the perceptron was exceedingly simple. There is no way that a network like this would be able to replicate the variety and subtlety of factors which go into making a complicated decision. As we make our perceptron circuit bigger and bigger, it is able to deal with more complex situations. But there is a problem. The number of weights to adjust also gets bigger and bigger. And the outcome that results in the adjustment of one weight may depend on the others in very subtle ways.

In my example discussed above, I stated ‘you adjust the weights and try again’. This is not too tricky with only 3 weights. But what if there were 10 weights? What if there were 100? How on earth would you know which ones to change, and in what order? You could try to be systematic about it and start by changing them one by one, each by a very small amount, and looking at the result. But exploring every single configuration of every single weight in this way is not a good idea. With 10 weights that can each be set to one of 10 values, the number of combinations you would have to try would be 10^{10} = 10 billion. With 100 weights and 100 values for each, the number of combinations would total more than the number of atoms in the universe. Finding the correct configuration of all these weights is a very hard problem. So how do people solve problems like this? I mean, people use large neural networks all the time to help machines learn, so there must be a way…

And indeed there is. Finding the best possible combination of all the weights may not be possible, but one can try and find a good compromise. There are many, many tricks that one can try, but in effect they are all just various ways of doing exactly the same thing – adjusting those weights to get to a good combination that best matches the manager’s behavior. This is known as an optimization problem. We have reduced the entirety of learning to an optimization problem! Why should we care? Well, it so happens that certain types of quantum computer can be very good indeed at solving optimization problems…

*Finding a good combination*

I’ll start by introducing a very common method of trying to find the best combination of all the weights, and then explain how introducing quantum effects can make this method better.

One way we could look for good combinations of weights is by starting with the weights set to random values, then picking one weight (again at random), and then seeing if adjusting it helped or hindered our progress towards correct decisions. If it did help, keep it at that setting, then pick another one and try adjusting that. Again, keep it only if things improve. This will ensure that you always making your system better. The problem with this approach is that you can end up thinking you have a good combination, whereas if you had started with something vastly different for your initial choice, you may have got a much better solution. You’d never know. Researchers in the field call this method gradient descent, and there are numerous variants on this simple way of doing it, however they all suffer from this problem to some extent.

A method known as simulated annealing tries to get around this problem by sometimes allowing the weights to change even if this makes your final outcome seem to get worse for a while. (This is similar to making a ‘sacrifice move’ in chess or another board game, whereby accepting a loss such as the opponent taking one of your pieces puts you in a good position to execute a clever strategic move sometime in the future). The idea is if you let these sacrificial moves happen a lot during the initial training phases, and reduce the number of times that you allow it to happen later in the training, you generally end up with a much better set of weights. In simulated annealing, the sacrificial moves (adjusting the weights to make things worse) are pretty much chosen at random, in the hope that some of them will help.

Quantum annealing is similar to this approach, but uses quantum effects to help the system automatically adjust its weights in a smarter way. This is because the quantum mechanical concepts of superposition and entanglement help the system. Being able to put your weights into what is known as a quantum mechanical ‘superposition’ of states (each weight can be several values at the same time) allows the system to see in advance where the best combinations might lie. (Again, using the chess analogy, this is similar to looking many moves ahead, with the player imagining the pieces being in all different combinations before making the best move). If quantum annealing is working properly, the system will know which sacrificial moves to make along the way, and should always find the best combination of weights at the end.

*A quantum education system for programs*

You’ve now found your optimal set of weights by utilizing a quantum computation. Awesome. But wait… There’s an extra bonus here.

Unlike other quantum algorithms, using a quantum computer for learning means that you don’t just get a number out at the end. You actually get a *trained program* (network of weights). Which means our quantum computation is a manager-generator! So don’t need the quantum computer once it has trained the manager, just as you don’t need an entire business school to make corporate decisions. The quantum computer can then be set to work training other things whilst the manager-programs themselves go out into the world to do great things! Of course, your manager could always go back to the quantum-training school to improve further. The interesting thing is that the quantum computer doesn’t just have to train one type of program (a manager). It can teach almost anything given enough data about the real world. It could train medical-diagnosis programs, image-recognition programs, or even programs that summarize the key concepts behind a book or a scientific paper. The bigger the quantum system, the more weights it can adjust, and more high-level the concepts that can be learned.

Of course, my business analogy wasn’t purely accidental. Maybe one day soon quantum-trained programs will be predicting the stock market, and making investment decisions based on the results. Business may never be the same again! Whatever subject one chooses to focus on at this quantum-school, I personally think that the ability to train programs to go out into the world and solve problems themselves, and to generate machine intelligences that learn, is much more exciting than factoring large numbers.

More!!!🙂

So does that mean all the problems that use simulated annealing as a solver can also be solved using quantum annealing?

Yes!

Great write up! A really clear explanation for the layman. In your example of 100 weights with 100 values, how long will it take today’s quantum computer to calculate the best variation?

The short answer is that it depends! Just like with conventional heuristics (genetic algorithms, simulated annealing, etc.) you can run quantum annealing for as long (or as short) as you like. The solution quality typically gets better if you run it longer. Superconducting processors have natural timescales on the order of nanoseconds, which sets something like the fastest you can run a computation.

Thank you! This is an amazing thing. Just thinking about the possibilities for our species utilizing this technology has me dazed.

nice! thanks you for your sharing.

Excellent post! I love the pictures😀

I have been eagerly awaiting any prototype of a a quantum computer good for AI. Where can I contact you for further followup. Do you realise that this means not just smart agents but as part of a Hybrid Conventional/Quantum computer System a real AI system ?

Excelente. Y muchas gracias por la información.

Saludos desde el último lugar del mundo.

Excellent. This helped further my understanding. You should write a book and let us know about it.

I wonder if is possible to train an ANN to help with the supply/demand problem. Maybe a computer deciding what to produce based on available resources and public demand would be beneficial since the computer can be programmed to make the most efficient decisions and eliminating human error as a side effect. I think this is the best way to have minimum deficiency/surplus of available goods.

Greetings! Very helpful advice within this post! It is the little changes that make the greatest changes. Many thanks for sharing!

Pingback: 20 Sept 2014: The Usefulness of Science in Daily Life | deldotme