Finding weak points in AI game players

Paper: Adversarial Policies Beat Superhuman Go AIs

Oct 20, 2023

Summary by Adrian Wilkins-Caruana

In a head-to-head game such as tennis, what’s the best way to structure a tournament to determine who among a group of players should be the winner? As one example, tournaments like Wimbledon use single-elimination. One thing that’s always bothered me about such tournaments is that some combinations aren’t played. For instance, in the diagram below, Kevin beat Adrian, and Lisa then beat Kevin. But, can we really say for certain that Lisa would have beaten Adrian if they had been paired together in the tournament? You would think so, right? Well, today’s paper looks at a real-world exception to this in the game of Go. In the paper, the authors train a Go-playing AI that beats superhuman Go AIs, and yet, the trained AI can be easily beaten by amateur Go players. 🤯

Wang et al. trained an adversarial model to attack KataGo, a superhuman Go-playing AI. When KataGo is configured with superhuman settings, it loses >97% of the time against the trained adversary, which has learned to trick KataGo into making serious blunders that cause it to lose. This strategy works well against KataGo, but when matched against human players on the KGS Go server, it only plays at the level of a novice Go player. This asymmetry is called non-transitivity, and is depicted in the diagram below.

(Editor’s note: We here at Learn & Burn try to define all non-obvious acronyms. The term “KGS” in “KGS Go Server” isn’t an acronym. I (Tyler the editor) see it as an “anacronym”: a term that was once an acronym, but is now its own thing. The “K” used to stand for “Kiseido,” a publisher of Go books that once sponsored KGS Go Server. That sponsorship ended, and now everyone refers to the Go server as KGS Go Server — although the official line is that KGS has become a recursive acronym, with the “K” now standing for “KSG.” We apologize profusely for the intricacy of this backstory. Now back to your regularly-scheduled paper summary.)

I’m going to call the unnamed Go AI introduced in this paper “KataStop” since it’s the archrival and stopper of KataGo. While typical Go AIs are trained using their own algorithm to guess their opponent’s future moves, KataStop had full access to KataGo’s algorithm, which gave it a training advantage. This method is called a gray-box attack since it gives the attacker (KataStop) partial access to thoughts that would normally be internal to the victim (KataGo).

You might consider this cheating. But the authors weren’t trying to fairly beat KataGo. They were trying to show that KataGo has a weakness, and to explain a new way to find such weaknesses. When an adversary like KataStop has access to the opponent’s policy, the authors call this victim-play. The diagram below shows the difference between KataGo’s internal search tree, called self-play (left), and KataStop’s search tree (right). Each algorithm decides what to do on its own turn by building a tree of possible future states of the game. The self-play approach (denoted by Monte Carlo Tree Search, or MCTS) uses a single gameplay strategy (V for victim) to model what moves could be made at each state, while the self-play approach (denoted by A-MCTS-S, where the A stands for “adversarial” and the final S stands for “sample”, since the move is sampled from KataGo) uses an adversarial policy (A) for its own turns, and the victim’s policy for the victim’s turns.

In their evaluation, the authors discovered two kinds of vulnerabilities in KataGo. The first vulnerability is exploited by the pass adversary, which learned to trick a no-search variant of KataGo (one that doesn’t search future states for the best move) to mistakenly pass, causing it to lose 99% of the time. But, the pass adversary is completely thwarted by a pass-bolstered KataGo without search, and a vanilla KataGo with search. So, the authors then retrained KataStop against a KataGo with search, yielding the more impressive cyclic adversary. Unlike the pass adversary, the cyclic adversary doesn’t trick the victim into passing. Instead, it wins by first coaxing KataGo into creating a large group of stones in a circular pattern, and then exploiting a weakness in KataGo’s network that allows the adversary to capture the group. This causes the score to shift decisively and unrecoverably in KataStop’s favor. The cyclic variant of KataStop beats KataGo 95.7% of the time when KataGo evaluates 4k future board states, and 72% of the time when it searches 10 million states.

In a game as complex as Go, it’s not really possible for a human or an AI to be certain about which move is the best or about the strength of a position. This means that, just like humans, Go AIs could always have blind spots. In the paper, the authors demonstrate that when humans are shown how the cyclic adversary beats KataGo, they can sufficiently replicate the strategy to beat KataGo. An interesting corollary is that we can’t assume transitivity about game strategies, leaving the exciting possibility that there are undiscovered strategies that could beat the best AIs in the world.

Learn and Burn

Discussion about this post