
This project is inspired by this reddit post. In that reddit post, the op analyzed and made a plot about total distance traveled by GM Magnus Carlsen’s Chess pieces over his career. It was a very interesting post and was done in Python in Google Colab utilizing chess-python and pandas libraries. Visualization generated with Seaborn library. I would like to replicate the idea practicing with R and explored something different.

Read Data

The data can be downloaded from

Once downloaded, thanks for the R package bigchess, we can easilly read the moves and games from pgn file to dataframe.

df  = read.pgn("Carlsen.pgn")

## 2021-02-21 20:32:20, successfully imported 3430 games

## 2021-02-21 20:32:20, N moves computed

## 2021-02-21 20:32:20, extract moves done

## 2021-02-21 20:32:22, stat moves computed

# glimpse(df)

Data Manipualtion

Let’s filter the data by GM Magnus Carlsen, and record his moves overall by Bishop, King, Knight, Queen, and Rook.

## white
white = df %>% 
  filter(str_detect(White, "Carlsen,M")) %>% 
  mutate(name = White, 
         player = "white", 
         isWhite = TRUE, 
         MC_result = case_when(
           Result == '1-0' ~ 'W',
           Result == '0-1' ~ 'L',
           Result == '1/2-1/2' ~ 'D'
         B_moves = W_B_moves, 
         K_moves = W_K_moves,
         N_moves = W_N_moves,
         O_moves = W_O_moves,
         Q_moves = W_Q_moves,
         R_moves = W_R_moves, 
         P_moves = NMoves - W_B_moves - W_K_moves - W_N_moves - W_Q_moves-W_R_moves - W_O_moves) %>% 
  mutate(MC_result = factor(MC_result, levels=c("W", "D", "L"))) %>% 
  select(name, player, MC_result, NMoves, B_moves, K_moves, N_moves, Q_moves, R_moves, P_moves, O_moves)

black = df %>% 
  filter(str_detect(Black, "Carlsen,M")) %>% 
  mutate(name = Black, 
         player = "Black", 
         isWhite = FALSE, 
         MC_result = case_when(
           Result == '1-0' ~ 'L',
           Result == '0-1' ~ 'W',
           Result == '1/2-1/2' ~ 'D'
         B_moves = B_B_moves, 
         K_moves = B_K_moves,
         N_moves = B_N_moves,
         O_moves = B_O_moves,
         Q_moves = B_Q_moves,
         R_moves = B_R_moves, 
         P_moves = NMoves - B_B_moves - B_K_moves - B_N_moves - B_Q_moves-B_R_moves - B_O_moves) %>% 
  mutate(MC_result = factor(MC_result, levels=c("W", "D", "L"))) %>% 
  select(name, player, MC_result, NMoves, B_moves, K_moves, N_moves, Q_moves, R_moves, P_moves, O_moves)

# both = df %>% 
#  filter(str_detect(Black, "Carlsen") & str_detect(White, "Carlsen"))
# it happens that there is one day with both players named Carlsen

# by games:
games = bind_rows(white, black)

# by pieces
carlsen = bind_rows(white, black) %>% 
  gather(key="type", value="moves", -c(name, player, MC_result, NMoves)) %>% 
  mutate(move_pct = moves/NMoves, 
         piece = case_when(
           type == 'B_moves' ~ 'Bishop',
           type == 'K_moves' ~ 'King',
           type == 'N_moves' ~ 'Knight',
           type == 'Q_moves' ~ 'Queen',
           type == 'R_moves' ~ 'Rook',
           type == 'P_moves' ~ 'Pawn'
         piece = factor(piece, levels = c('King', 'Queen', 'Rook', 'Bishop', 'Knight', 'Pawn'))

Get summary

Let’s summarize how the distribution of moves by piece type per game

carlsen %>% 
  filter(type != "O_moves") %>% 
  ggplot(aes(x=piece, y=move_pct, color = player)) + 
  geom_boxplot() + 
  xlab("Chess Piece") + 
  ylab("Move Percentage within Game") +
  ggtitle("Move Percentage by Chess Pieces of GM Magnus Carlsen") + 
  scale_y_continuous(labels = scales::percent) + 


Winning percentage by Opening Hand

The R package bigchess has some interesting functions, such as browse_opening which can explore the winning percentage by opening moves:

bo = browse_opening(subset(df, grepl("Carlsen", White)))

Winning distribution

winpct = games %>% 
  count(player, MC_result) %>% 
  group_by(player) %>% 
  mutate(freq = n/sum(n))

winpct %>% 
  ggplot(aes(fill=MC_result, y=freq, x=player))+
  geom_bar(position="stack", stat="identity") + 
  xlab("Player") + 
  ylab("Winning Percentage") +
  scale_fill_manual(values = c("green", "grey", "red"))+
  ggtitle("Winning Percentage by Black/White of GM Magnus Carlsen") + 
  scale_y_continuous(labels = scales::percent) + 


21 February 2021


21 February 2021
