-
-
Notifications
You must be signed in to change notification settings - Fork 428
/
Copy pathchess.py
352 lines (274 loc) · 14.8 KB
/
chess.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
# noqa: D212, D415
"""
# Chess
```{figure} classic_chess.gif
:width: 140px
:name: chess
```
This environment is part of the <a href='..'>classic environments</a>. Please read that page first for general information.
| Import | `from pettingzoo.classic import chess_v6` |
|--------------------|------------------------------------|
| Actions | Discrete |
| Parallel API | Yes |
| Manual Control | No |
| Agents | `agents= ['player_0', 'player_1']` |
| Agents | 2 |
| Action Shape | Discrete(4672) |
| Action Values | Discrete(4672) |
| Observation Shape | (8,8,111) |
| Observation Values | [0,1] |
Chess is one of the oldest studied games in AI. Our implementation of the observation and action spaces for chess are what the AlphaZero method uses, with two small changes.
### Observation Space
The observation is a dictionary which contains an `'observation'` element which is the usual RL observation described below, and an `'action_mask'` which holds the legal moves, described in the Legal Actions Mask section.
Like AlphaZero, the main observation space is an 8x8 image representing the board. It has 111 channels representing:
* Channels 0 - 3: Castling rights:
* Channel 0: All ones if white can castle queenside
* Channel 1: All ones if white can castle kingside
* Channel 2: All ones if black can castle queenside
* Channel 3: All ones if black can castle kingside
* Channel 4: Is black or white
* Channel 5: A move clock counting up to the 50 move rule. Represented by a single channel where the *n* th element in the flattened channel is set if there has been *n* moves
* Channel 6: All ones to help neural networks find board edges in padded convolutions
* Channel 7 - 18: One channel for each piece type and player color combination. For example, there is a specific channel that represents black knights. An index of this channel is set to 1 if a black knight is in the corresponding spot on the game board, otherwise, it is set to 0.
Similar to LeelaChessZero, en passant possibilities are represented by displaying the vulnerable pawn on the 8th row instead of the 5th.
* Channel 19: represents whether a position has been seen before (whether a position is a 2-fold repetition)
* Channel 20 - 111 represents the previous 7 boards, with each board represented by 13 channels. The latest board occupies the first 13 channels, followed by the second latest board, and so on. These 13 channels correspond to channels 7 - 20.
Similar to AlphaZero, our observation space follows a stacking approach, where it accumulates the previous 8 board observations.
Unlike AlphaZero, where the board orientation may vary, in our system, the `env.board_history` always maintains the orientation towards the white agent, with the white agent's king consistently positioned on the 1st row. In simpler terms, both players are observing the same board layout.
Nevertheless, we have incorporated a convenient feature, the env.observe('player_1') function, specifically for the black agent's orientation. This facilitates the training of agents capable of playing proficiently as both black and white.
#### Legal Actions Mask
The legal moves available to the current agent are found in the `action_mask` element of the dictionary observation. The `action_mask` is a binary vector where each index of the vector represents whether the action is legal or not. The `action_mask` will be all zeros for any agent except the one
whose turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.
### Action Space
From the AlphaZero chess paper:
> [In AlphaChessZero, the] action space is a 8x8x73 dimensional array.
Each of the 8×8 positions identifies the square from which to “pick up” a piece. The first 56 planes encode possible ‘queen moves’ for any piece: a number of squares [1..7] in which the piece will be
moved, along one of eight relative compass directions {N, NE, E, SE, S, SW, W, NW}. The
next 8 planes encode possible knight moves for that piece. The final 9 planes encode possible
underpromotions for pawn moves or captures in two possible diagonals, to knight, bishop or
rook respectively. Other pawn moves or captures from the seventh rank are promoted to a
queen.
We instead flatten this into 8×8×73 = 4672 discrete action space.
You can get back the original (x,y,c) coordinates from the integer action `a` with the following expression: `(a // (8*73), (a // 73) % 8, a % (8*73) % 73)`
Example:
>>> x = 6
>>> y = 0
>>> c = 12
>>> a = x*(8*73) + y*73 + c
>>> print(a // (8*73), a % (8*73) // 73, a % (8*73) % 73)
6 0 12
Note: the coordinates (6, 0, 12) correspond to column 6, row 0, plane 12. In chess notation, this would signify square G1:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| A | B | C | D | E | F | G | H |
### Rewards
| Winner | Loser | Draw |
| :----: | :---: | :---: |
| +1 | -1 | 0 |
### Version History
* v6: Fixed wrong player starting first, check for insufficient material/50-turn rule/three fold repetition (1.23.2)
* v5: Changed python-chess version to version 1.7 (1.13.1)
* v4: Changed observation space to proper AlphaZero style frame stacking (1.11.0)
* v3: Fixed bug in arbitrary calls to observe() (1.8.0)
* v2: Legal action mask in observation replaced illegal move list in infos (1.5.0)
* v1: Bumped version of all environments due to adoption of new agent iteration scheme where all agents are iterated over after they are done (1.4.0)
* v0: Initial versions release (1.0.0)
"""
from __future__ import annotations
from os import path
import chess
import gymnasium
import numpy as np
import pygame
from gymnasium import spaces
from gymnasium.utils import EzPickle
from pettingzoo import AECEnv
from pettingzoo.classic.chess import chess_utils
from pettingzoo.utils import wrappers
from pettingzoo.utils.agent_selector import AgentSelector
def env(**kwargs):
env = raw_env(**kwargs)
env = wrappers.TerminateIllegalWrapper(env, illegal_reward=-1)
env = wrappers.AssertOutOfBoundsWrapper(env)
env = wrappers.OrderEnforcingWrapper(env)
return env
class raw_env(AECEnv, EzPickle):
metadata = {
"render_modes": ["human", "ansi", "rgb_array"],
"name": "chess_v6",
"is_parallelizable": False,
"render_fps": 2,
}
def __init__(self, render_mode: str | None = None, screen_height: int | None = 800):
EzPickle.__init__(self, render_mode, screen_height)
super().__init__()
self.board = chess.Board()
self.agents = [f"player_{i}" for i in range(2)]
self.possible_agents = self.agents[:]
self._agent_selector = AgentSelector(self.agents)
self.action_spaces = {name: spaces.Discrete(8 * 8 * 73) for name in self.agents}
self.observation_spaces = {
name: spaces.Dict(
{
"observation": spaces.Box(
low=0, high=1, shape=(8, 8, 111), dtype=bool
),
"action_mask": spaces.Box(
low=0, high=1, shape=(4672,), dtype=np.int8
),
}
)
for name in self.agents
}
self.rewards = None
self.infos = {name: {} for name in self.agents}
self.truncations = {name: False for name in self.agents}
self.terminations = {name: False for name in self.agents}
self.agent_selection = None
self.board_history = np.zeros((8, 8, 104), dtype=bool)
assert render_mode is None or render_mode in self.metadata["render_modes"]
self.render_mode = render_mode
self.screen_height = self.screen_width = screen_height
self.screen = None
if self.render_mode in ["human", "rgb_array"]:
self.BOARD_SIZE = (self.screen_width, self.screen_height)
self.clock = pygame.time.Clock()
self.cell_size = (self.BOARD_SIZE[0] / 8, self.BOARD_SIZE[1] / 8)
bg_name = path.join(path.dirname(__file__), "img/chessboard.png")
self.bg_image = pygame.transform.scale(
pygame.image.load(bg_name), self.BOARD_SIZE
)
def load_piece(file_name):
img_path = path.join(path.dirname(__file__), f"img/{file_name}.png")
return pygame.transform.scale(
pygame.image.load(img_path), self.cell_size
)
self.piece_images = {
"pawn": [load_piece("pawn_black"), load_piece("pawn_white")],
"knight": [load_piece("knight_black"), load_piece("knight_white")],
"bishop": [load_piece("bishop_black"), load_piece("bishop_white")],
"rook": [load_piece("rook_black"), load_piece("rook_white")],
"queen": [load_piece("queen_black"), load_piece("queen_white")],
"king": [load_piece("king_black"), load_piece("king_white")],
}
def observation_space(self, agent):
return self.observation_spaces[agent]
def action_space(self, agent):
return self.action_spaces[agent]
def observe(self, agent):
current_index = self.possible_agents.index(agent)
observation = chess_utils.get_observation(self.board, current_index)
observation = np.dstack((observation[:, :, :7], self.board_history))
# We need to swap the white 6 channels with black 6 channels
if current_index == 1:
# 1. Mirror the board
observation = np.flip(observation, axis=0)
# 2. Swap the white 6 channels with the black 6 channels
for i in range(1, 9):
tmp = observation[..., 13 * i - 6 : 13 * i].copy()
observation[..., 13 * i - 6 : 13 * i] = observation[
..., 13 * i : 13 * i + 6
]
observation[..., 13 * i : 13 * i + 6] = tmp
legal_moves = (
chess_utils.legal_moves(self.board) if agent == self.agent_selection else []
)
action_mask = np.zeros(4672, "int8")
for i in legal_moves:
action_mask[i] = 1
return {"observation": observation, "action_mask": action_mask}
def reset(self, seed=None, options=None):
self.agents = self.possible_agents[:]
self.board = chess.Board()
self._agent_selector = AgentSelector(self.agents)
self.agent_selection = self._agent_selector.reset()
self.rewards = {name: 0 for name in self.agents}
self._cumulative_rewards = {name: 0 for name in self.agents}
self.terminations = {name: False for name in self.agents}
self.truncations = {name: False for name in self.agents}
self.infos = {name: {} for name in self.agents}
self.board_history = np.zeros((8, 8, 104), dtype=bool)
if self.render_mode == "human":
self.render()
def set_game_result(self, result_val):
for i, name in enumerate(self.agents):
self.terminations[name] = True
result_coef = 1 if i == 0 else -1
self.rewards[name] = result_val * result_coef
self.infos[name] = {"legal_moves": []}
def step(self, action):
if (
self.terminations[self.agent_selection]
or self.truncations[self.agent_selection]
):
return self._was_dead_step(action)
current_agent = self.agent_selection
current_index = self.agents.index(current_agent)
# Cast action into int
action = int(action)
chosen_move = chess_utils.action_to_move(self.board, action, current_index)
assert chosen_move in self.board.legal_moves
self.board.push(chosen_move)
next_legal_moves = chess_utils.legal_moves(self.board)
is_stale_or_checkmate = not any(next_legal_moves)
# claim draw is set to be true to align with normal tournament rules
is_insufficient_material = self.board.is_insufficient_material()
can_claim_draw = self.board.can_claim_draw()
game_over = can_claim_draw or is_stale_or_checkmate or is_insufficient_material
if game_over:
result = self.board.result(claim_draw=True)
result_val = chess_utils.result_to_int(result)
self.set_game_result(result_val)
self._accumulate_rewards()
# Update board after applying action
# We always take the perspective of the white agent
next_board = chess_utils.get_observation(self.board, player=0)
self.board_history = np.dstack(
(next_board[:, :, 7:], self.board_history[:, :, :-13])
)
self.agent_selection = (
self._agent_selector.next()
) # Give turn to the next agent
if self.render_mode == "human":
self.render()
def render(self):
if self.render_mode is None:
gymnasium.logger.warn(
"You are calling render method without specifying any render mode."
)
elif self.render_mode == "ansi":
return str(self.board)
elif self.render_mode in {"human", "rgb_array"}:
return self._render_gui()
else:
raise ValueError(
f"{self.render_mode} is not a valid render mode. Available modes are: {self.metadata['render_modes']}"
)
def _render_gui(self):
if self.screen is None:
pygame.init()
if self.render_mode == "human":
pygame.display.set_caption("Chess")
self.screen = pygame.display.set_mode(self.BOARD_SIZE)
elif self.render_mode == "rgb_array":
self.screen = pygame.Surface(self.BOARD_SIZE)
self.screen.blit(self.bg_image, (0, 0))
for square, piece in self.board.piece_map().items():
pos_x = square % 8 * self.cell_size[0]
pos_y = (
self.BOARD_SIZE[1] - (square // 8 + 1) * self.cell_size[1]
) # offset because pygame display is flipped
piece_name = chess.piece_name(piece.piece_type)
piece_img = self.piece_images[piece_name][piece.color]
self.screen.blit(piece_img, (pos_x, pos_y))
if self.render_mode == "human":
pygame.display.update()
self.clock.tick(self.metadata["render_fps"])
elif self.render_mode == "rgb_array":
return np.transpose(
np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)
)
def close(self):
if self.screen is not None:
pygame.quit()
self.screen = None