AlphaGo

Written by Brian Cloutier
February 22, 2016

For a game the western world doesn't know much about Go is implausibly harder than Chess. Forget playing it, it's hard to even tell who's winning. Humans have trouble scoring a half-completed game between pros.

On average, there are 250 different moves you can make (compared to Chess' 35) at any given time. And to reiterate, even if you could look 9 moves ahead, a number which seems small but leads to as many possible Go positions as there are stars in the observable universe, it won't help because you still have no idea which of those 1022 positions are better for you!

Humans get around this by only seeing good moves. I'm not being poetic when I call it intuition; if you ask a pro why they made a given move their answer will quite often be "it was good shape" or "the other sequences didn't feel right". (For a great illustration of this, there's a book called the Go Consultants)

This is what makes Deep Convolutional Neural Networks (I'll just call them NNs) such a good match for Go. By inventing them we've somehow distilled a very pure form of pattern recognition, which Go seems to be about.

My previous favorite paper, from 2014, took a bunch of professional games and trained a NN to predict the next move. The idea is when the network is given a new position it'll play the move a professional would have played. This network, without looking ahead, without anything but "this move feels better", plays around how well amateurs play after years of study.

Strong amateurs destroy it though, because of two flaws:

It fails hilariously and makes rookie mistakes when you give it a situation which never comes up in pro games. Go involves a lot of implicit threats which are rarely carried out. Networks learn to make those threats but, lacking training data, are incapable of following up.

Facebook mitigated that first problem by combining it with a technique called Monte Carlo Tree Search (MCTS). The intuition here is that we've returned to computer as cold calculating machine however instead of looking at every position it probabilistically looks at the positions which the NN thinks matters. It still can't score positions so it uses a massive hack which is, depending on your worldview, either ugly or elegant, but definitely effective.

The second problem is deeper: If it's just trained to imitate pros, how can it ever become better than pros?

This is the problem AlphaGo solves. Deep Mind trained a NN to predict the move a professional would play, and actually did worse than state of the art networks, but then retrained that network to play the move which will win the game.

If you're a professional, this is the same move. But if you're this crazy NN+MCTS hybrid they can be pretty different (like not wasting moves on threats you can't carry out). They trained it by having it play against previous versions of itself, effectively using human data to bootstrap a better player.

They also did what will go down in history as hilariously obvious in retrospect but which nobody saw coming, they trained not one but two different networks. The first network is as described above and the second network scores positions. They use the first network to figure out where to play, cutting down the width of the search tree; and they use the second network to figure out whether a position is good, cutting down the depth of the search tree. They actually combine that second network with the MCTS hack to get the best of both techniques. (speed and accuracy)

There are a couple cool other things in the paper, like the fact that they managed to write a distributed version which runs across 1202 CPUs. It's not surprising that they have so many CPUs, it's surprising that they were able to write a distributed version at all. That alone would have been a revolutionary paper, nobody else is quite sure how to do it yet. They just casually mentioned writing a distributed version which beats the original 80% of the time.

So the real answer to what makes AlphaGo so strong is, "the people at Deep Mind are geniuses and invented this deep reinforcement learning stuff in the first place". Why does it matter? Because we weren't able to cheat this time. Computers beat humans at chess not by being smarter but by thinking faster.

Go has been pitched to you as a sort of intellectual Everest. That it's hard is reason enough to attempt to solve it, glory will go to the team to solve it first. That's a little true, but Go also feels like it takes some very human skills to master. If you asked people, 10 years ago, whether computers could have "intuition" they'd say definitely not. Both MCTS and NNs have limitations and there are aspects of the game which we're pretty sure humans are still better at but it seems like the intuition part is no longer where we excel.

If AlphaGo ends up beating Lee Sedol, it will be validation that we've finally unlocked one small piece of human intelligence. We've gotten a lot closer to understanding Go. By formalizing pattern recognition, we've gotten a little closer to understanding ourselves.

Brian Cloutier

Written by Brian Cloutier