PettingZoo/pettingzoo/classic/chess/chess.py at master · Farama-Foundation/PettingZoo

History

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

# noqa: D212, D415

"""

# Chess

```{figure} classic_chess.gif

:width: 140px

:name: chess

```

This environment is part of the <a href='..'>classic environments</a>. Please read that page first for general information.

| Import | `from pettingzoo.classic import chess_v6` |

|--------------------|------------------------------------|

| Actions | Discrete |

| Parallel API | Yes |

| Manual Control | No |

| Agents | `agents= ['player_0', 'player_1']` |

| Agents | 2 |

| Action Shape | Discrete(4672) |

| Action Values | Discrete(4672) |

| Observation Shape | (8,8,111) |

| Observation Values | [0,1] |

Chess is one of the oldest studied games in AI. Our implementation of the observation and action spaces for chess are what the AlphaZero method uses, with two small changes.

### Observation Space

The observation is a dictionary which contains an `'observation'` element which is the usual RL observation described below, and an `'action_mask'` which holds the legal moves, described in the Legal Actions Mask section.

Like AlphaZero, the main observation space is an 8x8 image representing the board. It has 111 channels representing:

* Channels 0 - 3: Castling rights:

* Channel 0: All ones if white can castle queenside

* Channel 1: All ones if white can castle kingside

* Channel 2: All ones if black can castle queenside

* Channel 3: All ones if black can castle kingside

* Channel 4: Is black or white

* Channel 5: A move clock counting up to the 50 move rule. Represented by a single channel where the *n* th element in the flattened channel is set if there has been *n* moves

* Channel 6: All ones to help neural networks find board edges in padded convolutions

* Channel 7 - 18: One channel for each piece type and player color combination. For example, there is a specific channel that represents black knights. An index of this channel is set to 1 if a black knight is in the corresponding spot on the game board, otherwise, it is set to 0.

Similar to LeelaChessZero, en passant possibilities are represented by displaying the vulnerable pawn on the 8th row instead of the 5th.

* Channel 19: represents whether a position has been seen before (whether a position is a 2-fold repetition)

* Channel 20 - 111 represents the previous 7 boards, with each board represented by 13 channels. The latest board occupies the first 13 channels, followed by the second latest board, and so on. These 13 channels correspond to channels 7 - 20.

Similar to AlphaZero, our observation space follows a stacking approach, where it accumulates the previous 8 board observations.

Unlike AlphaZero, where the board orientation may vary, in our system, the `env.board_history` always maintains the orientation towards the white agent, with the white agent's king consistently positioned on the 1st row. In simpler terms, both players are observing the same board layout.

Nevertheless, we have incorporated a convenient feature, the env.observe('player_1') function, specifically for the black agent's orientation. This facilitates the training of agents capable of playing proficiently as both black and white.

#### Legal Actions Mask

The legal moves available to the current agent are found in the `action_mask` element of the dictionary observation. The `action_mask` is a binary vector where each index of the vector represents whether the action is legal or not. The `action_mask` will be all zeros for any agent except the one

whose turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.

### Action Space

From the AlphaZero chess paper:

> [In AlphaChessZero, the] action space is a 8x8x73 dimensional array.

Each of the 8×8 positions identifies the square from which to “pick up” a piece. The first 56 planes encode possible ‘queen moves’ for any piece: a number of squares [1..7] in which the piece will be

moved, along one of eight relative compass directions {N, NE, E, SE, S, SW, W, NW}. The

next 8 planes encode possible knight moves for that piece. The final 9 planes encode possible

underpromotions for pawn moves or captures in two possible diagonals, to knight, bishop or

rook respectively. Other pawn moves or captures from the seventh rank are promoted to a

queen.

We instead flatten this into 8×8×73 = 4672 discrete action space.

You can get back the original (x,y,c) coordinates from the integer action `a` with the following expression: `(a // (8*73), (a // 73) % 8, a % (8*73) % 73)`

Example:

>>> x = 6

>>> y = 0

>>> c = 12

>>> a = x*(8*73) + y*73 + c

>>> print(a // (8*73), a % (8*73) // 73, a % (8*73) % 73)

6 0 12

Note: the coordinates (6, 0, 12) correspond to column 6, row 0, plane 12. In chess notation, this would signify square G1:

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |

| A | B | C | D | E | F | G | H |

### Rewards

| Winner | Loser | Draw |

| :----: | :---: | :---: |

| +1 | -1 | 0 |

### Version History

* v6: Fixed wrong player starting first, check for insufficient material/50-turn rule/three fold repetition (1.23.2)

* v5: Changed python-chess version to version 1.7 (1.13.1)

* v4: Changed observation space to proper AlphaZero style frame stacking (1.11.0)

* v3: Fixed bug in arbitrary calls to observe() (1.8.0)

* v2: Legal action mask in observation replaced illegal move list in infos (1.5.0)

* v1: Bumped version of all environments due to adoption of new agent iteration scheme where all agents are iterated over after they are done (1.4.0)

* v0: Initial versions release (1.0.0)

"""

from __future__ import annotations

from os import path

import chess

import gymnasium

import numpy as np

import pygame

from gymnasium import spaces

from gymnasium.utils import EzPickle

from pettingzoo import AECEnv

from pettingzoo.classic.chess import chess_utils

from pettingzoo.utils import wrappers

from pettingzoo.utils.agent_selector import AgentSelector

def env(**kwargs):

env = raw_env(**kwargs)

env = wrappers.TerminateIllegalWrapper(env, illegal_reward=-1)

env = wrappers.AssertOutOfBoundsWrapper(env)

env = wrappers.OrderEnforcingWrapper(env)

return env

class raw_env(AECEnv, EzPickle):

metadata = {

"render_modes": ["human", "ansi", "rgb_array"],

"name": "chess_v6",

"is_parallelizable": False,

"render_fps": 2,

}

def __init__(self, render_mode: str | None = None, screen_height: int | None = 800):

EzPickle.__init__(self, render_mode, screen_height)

super().__init__()

self.board = chess.Board()

self.agents = [f"player_{i}" for i in range(2)]

self.possible_agents = self.agents[:]

self._agent_selector = AgentSelector(self.agents)

self.action_spaces = {name: spaces.Discrete(8 * 8 * 73) for name in self.agents}

self.observation_spaces = {

name: spaces.Dict(

{

"observation": spaces.Box(

low=0, high=1, shape=(8, 8, 111), dtype=bool

),

"action_mask": spaces.Box(

low=0, high=1, shape=(4672,), dtype=np.int8

),

}

)

for name in self.agents

}

self.rewards = None

self.infos = {name: {} for name in self.agents}

self.truncations = {name: False for name in self.agents}

self.terminations = {name: False for name in self.agents}

self.agent_selection = None

self.board_history = np.zeros((8, 8, 104), dtype=bool)

assert render_mode is None or render_mode in self.metadata["render_modes"]

self.render_mode = render_mode

self.screen_height = self.screen_width = screen_height

self.screen = None

if self.render_mode in ["human", "rgb_array"]:

self.BOARD_SIZE = (self.screen_width, self.screen_height)

self.clock = pygame.time.Clock()

self.cell_size = (self.BOARD_SIZE[0] / 8, self.BOARD_SIZE[1] / 8)

bg_name = path.join(path.dirname(__file__), "img/chessboard.png")

self.bg_image = pygame.transform.scale(

pygame.image.load(bg_name), self.BOARD_SIZE

)

def load_piece(file_name):

img_path = path.join(path.dirname(__file__), f"img/{file_name}.png")

return pygame.transform.scale(

pygame.image.load(img_path), self.cell_size

)

self.piece_images = {

"pawn": [load_piece("pawn_black"), load_piece("pawn_white")],

"knight": [load_piece("knight_black"), load_piece("knight_white")],

"bishop": [load_piece("bishop_black"), load_piece("bishop_white")],

"rook": [load_piece("rook_black"), load_piece("rook_white")],

"queen": [load_piece("queen_black"), load_piece("queen_white")],

"king": [load_piece("king_black"), load_piece("king_white")],

}

def observation_space(self, agent):

return self.observation_spaces[agent]

def action_space(self, agent):

return self.action_spaces[agent]

def observe(self, agent):

current_index = self.possible_agents.index(agent)

observation = chess_utils.get_observation(self.board, current_index)

observation = np.dstack((observation[:, :, :7], self.board_history))

# We need to swap the white 6 channels with black 6 channels

if current_index == 1:

# 1. Mirror the board

observation = np.flip(observation, axis=0)

# 2. Swap the white 6 channels with the black 6 channels

for i in range(1, 9):

tmp = observation[..., 13 * i - 6 : 13 * i].copy()

observation[..., 13 * i - 6 : 13 * i] = observation[

..., 13 * i : 13 * i + 6

]

observation[..., 13 * i : 13 * i + 6] = tmp

legal_moves = (

chess_utils.legal_moves(self.board) if agent == self.agent_selection else []

)

action_mask = np.zeros(4672, "int8")

for i in legal_moves:

action_mask[i] = 1

return {"observation": observation, "action_mask": action_mask}

def reset(self, seed=None, options=None):

self.agents = self.possible_agents[:]

self.board = chess.Board()

self._agent_selector = AgentSelector(self.agents)

self.agent_selection = self._agent_selector.reset()

self.rewards = {name: 0 for name in self.agents}

self._cumulative_rewards = {name: 0 for name in self.agents}

self.terminations = {name: False for name in self.agents}

self.truncations = {name: False for name in self.agents}

self.infos = {name: {} for name in self.agents}

self.board_history = np.zeros((8, 8, 104), dtype=bool)

if self.render_mode == "human":

self.render()

def set_game_result(self, result_val):

for i, name in enumerate(self.agents):

self.terminations[name] = True

result_coef = 1 if i == 0 else -1

self.rewards[name] = result_val * result_coef

self.infos[name] = {"legal_moves": []}

def step(self, action):

if (

self.terminations[self.agent_selection]

or self.truncations[self.agent_selection]

):

return self._was_dead_step(action)

current_agent = self.agent_selection

current_index = self.agents.index(current_agent)

# Cast action into int

action = int(action)

chosen_move = chess_utils.action_to_move(self.board, action, current_index)

assert chosen_move in self.board.legal_moves

self.board.push(chosen_move)

next_legal_moves = chess_utils.legal_moves(self.board)

is_stale_or_checkmate = not any(next_legal_moves)

# claim draw is set to be true to align with normal tournament rules

is_insufficient_material = self.board.is_insufficient_material()

can_claim_draw = self.board.can_claim_draw()

game_over = can_claim_draw or is_stale_or_checkmate or is_insufficient_material

if game_over:

result = self.board.result(claim_draw=True)

result_val = chess_utils.result_to_int(result)

self.set_game_result(result_val)

self._accumulate_rewards()

# Update board after applying action

# We always take the perspective of the white agent

next_board = chess_utils.get_observation(self.board, player=0)

self.board_history = np.dstack(

(next_board[:, :, 7:], self.board_history[:, :, :-13])

)

self.agent_selection = (

self._agent_selector.next()

) # Give turn to the next agent

if self.render_mode == "human":

self.render()

def render(self):

if self.render_mode is None:

gymnasium.logger.warn(

"You are calling render method without specifying any render mode."

)

elif self.render_mode == "ansi":

return str(self.board)

elif self.render_mode in {"human", "rgb_array"}:

return self._render_gui()

else:

raise ValueError(

f"{self.render_mode} is not a valid render mode. Available modes are: {self.metadata['render_modes']}"

)

def _render_gui(self):

if self.screen is None:

pygame.init()

if self.render_mode == "human":

pygame.display.set_caption("Chess")

self.screen = pygame.display.set_mode(self.BOARD_SIZE)

elif self.render_mode == "rgb_array":

self.screen = pygame.Surface(self.BOARD_SIZE)

self.screen.blit(self.bg_image, (0, 0))

for square, piece in self.board.piece_map().items():

pos_x = square % 8 * self.cell_size[0]

pos_y = (

self.BOARD_SIZE[1] - (square // 8 + 1) * self.cell_size[1]

) # offset because pygame display is flipped

piece_name = chess.piece_name(piece.piece_type)

piece_img = self.piece_images[piece_name][piece.color]

self.screen.blit(piece_img, (pos_x, pos_y))

if self.render_mode == "human":

pygame.display.update()

self.clock.tick(self.metadata["render_fps"])

elif self.render_mode == "rgb_array":

return np.transpose(

np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)

)

def close(self):

if self.screen is not None:

pygame.quit()

self.screen = None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chess.py

chess.py

Files

chess.py

Latest commit

History

chess.py

File metadata and controls