AlphaGo—Marvels of Artificial Intelligence for go

image: 

A stage during the fifth match of Lee Sedol vs AlphaGo
A stage during the fifth match of Lee Sedol vs AlphaGo—AlphaGo (white) lost a group of 8 stones as it did not know the tesuji of Sekito shibori (reproduced on the go board the author owns)

Very recently a shocking news hit the headline: one of the best go players in the world lost by 1-4 against a go-playing software. Here is the related history and what I thought of about it.

History of artificial intelligence (AI)

Artificial Intelligence, or AI, has been in people's imagination for ages. One of the most famous and first AIs in novels is HAL 9000 in Arthur C. Clarke's Space Odyssey series, starting from 2001: A Space Odyssey.

Interestingly, HAL 9000 can and does play chess, while it converses with people casually. In other words, Clarke apparently regarded it is easier for the AI to make conversation than to play chess. If an AI grew and developed as a human baby does, it would be like that, as a human baby first obtains the ability to converse then much later learns to play chess.

However, in reality, that is not how an AI works. For example, AIs are immensely superior to the best human in arithmetic calculations. Therefore, it does not make sense to compare it to a human baby.

The current world fastest super-computer in March 2016, Tianhe-2, is capable of making 30,000 trillion simple calculations per second (34 Peta-FLOPS, or 3×1016 FLOPS). Even a household computer can make a trillion (1 Tera-FLOPS, or 1012 FLOPS) calculations per second. On the other hand, I suppose the fastest human would do 10 per second at maximum. Hence, a household computer is 100 billion times faster, and the fastest super-computer is 1,000 trillion times faster, than the fastest human in arithmetic calculations. In addition, computers would not make a mistake, whereas humans do.

For programmers nowadays, it is a basic knowledge that AIs excel more in the fields in which the rules are better and more clearly described. Thus, so-called abstract strategy games are the ones AIs should perform best at. On the contrary, it is difficult to develop AIs that perform well in image-recognition or conversation in a natural language, the rules of which are hard to describe clearly. Among these, the most difficult example is a comedy; if one could clearly describe the rules of how to make some one burst into laughter, any one could become a comedian!

Having mentioned AIs should be good at abstract strategy games, it is seldom straightforward. After all human developers must program and regulate how it should (at least initially) behave. To develop a chess-playing machine, as HAL 9000 boasted, has been a challenge.

From chess-playing machine to shogi-playing one

The most famous real AI in the earliest days is probably the chess-playing machine Deep Blue by IBM. It played against the chess world champion Garry Kasparov, and overall beat him in 1996. It was an achievement at that time that hit the headlines all over the world.

I should note that it is debatable to call Deep Blue an AI, given it is a software specifically designed to play chess but nothing else. The same argument is still perfectly plausible with modern softwares at present. However, in this article I call them an AI for convenience.

The Japanese chess, or shogi, is far more complicated than (western) chess, because each player is allowed to use the captured pieces as their piece, which in turn massively increases the number of possible cases. Indeed, it had taken years after Deep Blue, before the best shogi-playing AIs reached the level of professional players.

The slow pace in development can be partly attributed to the fact no major organisation with massive resources like the IBM has been involved in the development. Nevertheless shogi-playing AIs have made a steady progress, and in October 2015, the Information Processing Society of Japan declared the end of their "Computer-based Shogi Project".

Games solved

Preceding a few years, in 2007, the Chinook team announced they had succeeded in solving the draughts. In other words, it now knows which side of black and white wins or draws, providing both the sides play perfectly. The same Chinook team had won against the human best player 10 years prior to that, in 1994 (or 1995, depending on the interpretation). Draughts are far simpler than, say, chess, and accordingly it was possible to solve it completely with the power of modern parallel super-computers.

In theory, the same is possible for chess, shogi or even go. I remember a scene in a cartoon; in a future match by two best shogi-playing machines, as soon as the first player moved the first piece (1-6 fu, which is the very unlikely first move), the other player conceded. It could be possible, theoretically.

Go-playing AIs before AlphaGo

It has been alleged go is in a different league from chess and shogi, because the number of freedom for its 19x19 board is great many orders of magnitude larger than those in chess and shogi.

For example, the number of the possible first moves in chess by black is 20, and that by white is 20, and accordingly the number of combination by the first two moves for black and white combined is mere 400. In shogi, it is 900. In go, it is 129,960. And then, the averaged number of moves per game in chess is 40, that in shogi (by professional players) is 110, and it is 220 for go. As such, you can tell go is far more complicated.

Mathematically, the total numbers of cases in common logarithm are 31 for draughts, 58 for Othello (or reverse), 70 for renju (or gomoku), 123 for chess, 226 for shogi, and 360 for go (Ref.1). They are in units of common logarithm, therefore the difference by 1 in the numbers above means the difference of 10 times. That is, go is more than 10 to the 130th (=10130) times more mathematically complex than shogi, and about 10 to the 330th (=10330) times more than draughts. Consequently, for a hypothetical ultra-fast machine which solves the draughts in a day, it would be impossible to solve the chess even if it kept calculating till the end of the universe, let alone go.

Between 2012–2014, the best go-playing machine (or AI) sometimes won and sometimes lost against the top professional human go players with the 4-stone handicap, which is massive. For example, top-amateur players would lose 100 out of 100 against the top-professional without handicap, yet the result would be the reverse if they were given 4-stone (or maybe even 3-stone) handicap. As such, the fact go-playing softwares may rival in 4-stone matches means the situation that they come close to the top-amateur level, yet are still miles away from the professional level. At that time, the consensus seemed it would take for some years (probably 10) before go-playing AIs would catch up with the top-professionals.

(Note) In handicapped matches in go, the better player would try to do something unreasonable, because they would have no chance to win if they played normally. It could be a suicidal life-or-death attack against a major group of stones of the opponent, or could be some small-scale but incessant attacks all over the board to demand a bit-by-bit give-ins to the opponents. If the opponent was good enough, such attacks would usually be defeated; however they may still win 1 out of 10 matches or something. For that reason, even in 4-stone handicapped matches, the top-amateur may not achieve the 100-out-of-100 wins.

AlphaGo vs. Lee Sedol for 5 matches

Then, AlphaGo appeared out of nowhere, which Google DeepMind developed, backed up with the massive resources of Google. Just a year or so after its birth, it played 5 matches in March 2016 against one of the world best go-players, Lee Sedol, and won by 4-1. It was shocking!

One of Japanese top-go players gave an on-going commentary in each match. A series of commentaries by Shinji Takao (only the 6th simultaneous Meijin-Honinbo in the recent past) in the second match was kind of symbolic. He explained up to the middle stage of the game that AlphaGo was playing badly and definitely losing, yet he (and others) later found out AlphaGo was actually winning, and indeed it won by a significant margin. Takao lamented What was I saying…?!

Looking back at it now, the way AlphaGo played was often incomprehensible for even top-professionals, and therefore they judged it was out of question. However, as the game went by, the stones AlphaGo had placed (i.e., moves) in the earlier stage that had been judged poor or wasteful by those professionals were found to become useful, and eventually it won. In other words, the moves by AlphaGo were felt alien to the sense of go shared by many or possibly all the professionals, yet turned out to be effective in the end.

Interestingly, the similar patterns have been observed repeatedly in AI-vs-human matches in shogi. In shogi, there are now several Computer joseki (standard and established sequence of moves). A shogi-playing software has played an unusual move, which no professional player had ever played or perhaps even thought of, and it was later accepted as effective even among human professional players, and became a joseki.

In the case of AlphaGo, it seemed its moves were more incomprehensible to professionals than equivalent cases in shogi. In the past matches between AI and a professional in shogi, every time the new move, which later became a joseki, was played by the AI, both the intention and effectiveness of the new move were understood a few moves later at latest by the observing professionals, and before long it would be established as a new joseki (after a more comprehensive examination by the human professionals). Compared to that, the meanings of some moves by AlphaGo seemed to be incomprehensible for the observing professionals even at the end of each game, and they seemed to be baffled. The fact the top-professional Takao didn't understand says something. Go has a far bigger choices for each move than shogi, and perhaps the fact contributes to this?

How AlphaGo operates

The key algorithm AlphaGo accepts is deep learning with Deep Neural Network (DNN). AlphaGo first learned a huge amount of past game records by high-level go players and aimed to play like them. Then it learned further, playing against other go-playing machines (it won almost all of the matches, reportedly), and finally it played against itself for tens of millions of matches and progressed by itself.

This is in a way a similar pattern to how human top-professionals progressed. It is of course essential for them to learn from the past game records by other players and actual games against other players. However, if they want to become the world best, they must play somewhat differently from other players, and that means they ought to learn by themselves in the end. AlphaGo has accomplished that. That is a remarkable milestone.

Interestingly, AlphaGo reportedly does not essentially understand the rules of go, even though it has learnt go extensively! Not to mention, it knows no tesuji (an effective and neat way to solve a local problem, which appears relatively frequently). For example, in the fifth match against Lee Sedol, a group of AlphaGo's white stones was terminated, because it didn't know the famous tesuji of Sekito shibori, which Lee applied to (the photo at the beginning of this article reproduced the state of the game. Despite the substantial local loss, AlphaGo won the game nonetheless… Staggering, really). AlphaGo has learnt, based on its learning of the past game records etc, that it had better avoid probabilistically moves like forbidden moves in order to win, and that is simply how it plays.

The first thing human beginners of go to learn is of course the rule. Then they would learn life-and-death of a group of stones, its application, joseki etc with literature and/or teachers (other players). However, the way AlphaGo took is very different. In the process of learning a huge record of past games by professional go players, it must have encountered many josekis as some characteristic patterns, and therefore it may play probabilistically along or close to an existing joseki. However AlphaGo doesn't know it is a joseki, and certainly doesn't follow it because it knows it is.

Similarly, the learning by AlphaGo is as follows. Suppose in a certain state of the game, there are three realistic moves: 1-space jump, 2-space jump, and diagonal move. It then simulates 3×1000 games following each of 3 moves from that state of the game till the end, and regards the move that has led to the most number of winnings as the most probable best move. For example, if it has won, out of 1000 games, 700 after 1-space jump, 400 after 2-space jump, and 300 after diagonal move, it sets the probability of the best move in the stage for 1-space jump to be 70%, as an example (see the note below). In contrast, what would human players do? They would with much consideration play, for example, for another 10 moves from the state, and then judge which move would have been superior. You could argue the way of AlphaGo is rather crude and brute, relying on its sheer computational power.

Note that AlphaGo probably does not perform this kind of searches starting from a single state of the game, and most likely sets the figure of probability differently from the above-mentioned example. The example mentioned above is just a simplified one.

In other words, it seems AlphaGo does not play (at least when there are many probable choices for the next move), based on the detailed and logical read of the forthcoming moves or future state of the game, but selects instinctively the next move that looks like the most probable one towards winning. Therefore, even the developers themselves can not possibly explain the meaning or intention or motivation of each move by AlphaGo. Yet, it has beaten the world top-level go player, only 1 year after its birth. It is alarming indeed.

I have heard some people arguing What's the matter (with the AI)?, pointing out the fact AlphaGo does not actually understand go. However, when it plays the right moves at the end of the day, does it matter whether it understands or not? And ultimately, what is understanding humans call? (It is a philosophical question.) One might argue that the understanding by AIs has some different vector than humans' one, yet is as effective as, or potentially even more effective than, humans' one.

Weak points of AlphaGo?

This probabilistic selection of moves by AlphaGo may appear as its weak point in a game. For example, in the middle game of the fourth match with Lee Sedol, when he was alleged to be losing, played a surprising wedge (warikomi) move (white 78), turned the table, and won. It was a marvellous and unlikely move, and amazed and highly impressed the observing professional players. I think, given the fact it was an unlikely move, that similar moves rarely appeared in the past game records AlphaGo used for its learning, and then it was possibly excluded probabilistically due to its low probability, and AlphaGo didn't consider it further. I must add it was not a move only humans could come up with, considering pretty much no other professional had thought of it before it was played. Rather, it is fair to say AlphaGo's limitation at the stage was similar to other ordinary professionals. No one but Lee Sedol thought of it, and so we should rather praise him for his genius and unhuman-like move!

Reportedly, according to the developer of AlphaGo, Demis Hassabis, AlphaGo did find the white 78 with the probability of 1/10,000, however it did not consider further as the probability was so low.

I should note this aspect of AIs, if it is generalised, may cause a problem in other occasions. Let's consider an auto-drive AI of cars that does not understand or know the traffic rule but just drives probabilistically the most safely and legally. Then, if some rare conditions happen simultaneously, chances are it may behave in a way no human, including its developer, can understand, such as violation of traffic rules and something unexpected by the nearby (human) users of the same road. Even if it happens very rarely, if the consequence is serious, the problem can well be serious.

Implication for the era of AIs

Finally, I will think of our future, following the remarkable achievement by AlphaGo and its developers.

Application of the core algorithm of AlphaGo

One of the remarkable achievements by DeepMind, the developer of AlphaGo, is the successful development of the machine-learning algorithm, which is the heart of AlphaGo, that is unprecedentedly generalised.

Reportedly, one of the AIs DeepMind has developed, using the machine-learning algorithm, has come to be able to play almost perfectly many different classic computer-games, such as Space Invaders. The AI did not know any rule before its first play of each game. Therefore, it played extremely poorly in the beginning. Yet, as it played on, it learned the optimum playing style, and progressed to be the perfect player before long. What I described learned means it found out without any external help the way to get a higher score by trial and error. In consequence, the AI didn't need know the rule of each game. To get a high score, it is inevitable it follows the rule of each game. For example, in the case of Space Invaders, it has to avoid attacks from invaders while attacking them.

What is remarkable in this is the same AI with the single algorithm was used in all of these experiments; it didn't have any game-specific algorithm, yet the common algorithm dealt with different games near perfectly. All the existing softwares, including Deep Blue, the iconic and historic chess-playing machine (software), have been developed more or less individually for their assigned objectives. It is common they hold some common routines in their codes, however it has been always human programmers' job to mix and combine them, and inevitably to code some unique parts for their specific objectives. However, it seems the general game-playing AI developed by DeepMind went a step ahead to be equipped with versatility, and has become more AI (Artificial Intelligence) than any others.

In the case of AlphaGo, it is equipped with, in addition to the similar machine-learning algorithm, self-learning with Monte-Carlo searches, and so is slightly different. However, the characteristic is the same for AlphaGo that it did not and does not know the rule, but has learnt the best-playing style during its learning process.

In comparison, all the best shogi-playing softwares have and use the extensive database of existing joseki (standard and established sequence of moves) in the early stage of the game; those softwares are designed specifically for shogi. It is obvious the possible combination of moves in the early stage of the game is too huge and meaning of each move at that stage is rather vague that it is hard for a computer to deal with, in shogi, or actually chess or go, too. Therefore, to follow the existing josekis must be the best and easiest way to develop quickly the best shogi-playing software.

In contrast, AlphaGo didn't take the approach, but it took, rather ambitiously, the general approach, which could be applied to other objectives than go. Remember it doesn't know the rule, let alone joseki. Despite that, and despite the fact go is far more complex than shogi, AlphaGo has overwhelmed the best human player, which no other go-playing AIs have got even close to. That is truly marvellous, and at the same time the indication of its immense potentials. It is almost frightening.

Some may wonder how could AlphaGo learn go without knowing the rule. My guess is as follows (I am pretty sure it is correct). In short they (DeepMind) have an independent facilitator/referee software, which knows perfectly the rule of go. Then, it does not matter AlphaGo does not know the rule; during its learning process with self-playing, it just tells the facilitator software every time it plays a move, and the facilitator responds to it like your move is forbidden, hence you have lost or the game has finished now, and territory is XXX vs YYY, hence you have won/lost, then the learning should work. Then, in actual matches against another opponent, be it a human or other computer programme, AlphaGo does not need a facilitator software, but can play by itself, even though it still does not know the rule.

The singularity is approaching

The founder of Google DeepMind, the developer of AlphaGo, is alleged to be a genius in the field of artificial intelligence. I suppose this achievement by AlphaGo was the result of the combined effort by the talent of the genius and immense fund by Google. In other words, AlphaGo did need a genius in the background.

It is similar to other go-playing softwares. In the Computer Go UEC Cup, the world most prestigious competitions between go-playing softwares, in recent years, no limitation was imposed for the hardware for the participants. Remarkably, the hardware of the winning machine (software) in 2013, Crazy Stone, was 30 times more inferior to the most powerful machine among the competitors. This implies the skill of the developer of Crazy Stone was worth more than 30-times difference in the computational power. We can conclude the superior skill of the developers is crucial in developing AIs. Like anything else, it does not sound surprising.

The key in developing go-playing AIs can be described in a word: to search for the optimum solution in each stage of the game. The computational power of computers are fully exploited to search for the optimum among the huge amount of elements. Similarly, the role of developers can be described in a word: to optimise the algorithm to perform the search as accurately and fast as possible.

Now, let us boldly assert what developers are doing is simply to optimise, and also boldly assume optimisation is what computers are good at. Then, we can foresee the era will come eventually, where optimisation of algorithm, too, which is currently the state of art by cutting-edge developers, can be made by computers themselves. That is, the finest skill of development reserved currently by skilled humans is overtaken by that of AIs. Once it happens, the AIs will develop by themselves, and they will before long be far beyond any human's reach.

The epoch when the ability of AIs to develop themselves becomes superior to that of humans is called
"Technological Singularity".
No one knows when it comes, or possibly it may never. However, considering what is in principle possible by computers and the rapid speed of development of AIs like marvels of AlphaGo, I have a feeling the technological singularity will come in the 21st century, or maybe even at mid-21st century.

The world governed by artificial intelligence (AI) has been a popular motif in science fictions and novels and alike for a long time, such as, the film Matrix and mangas like Phoenix by Osamu Tezuka. I am afraid humans are now beginning to need to be serious about such a world and to consider it realistically as the one where we are living.

References

Tags: 

Add new comment