DDPG on Discrete Action Space #285

Psyf · 2019-01-14T05:44:01Z

Hi.

I wanted to train an Actor-Critic on the Cartpole Environment (new Deel RL student here :) )
I know that sounds overkill, I just wanted to try.

This is my actor:

observation = Input(shape=(1, ) + env.observation_space.shape)
x = Dense(16, activation='relu')(observation)
x = Dense(16, activation='relu')(x)
x = Dense(16, activation='relu')(x)
x = Dense(num_actions, activation='linear')(x)	
actor_output = Reshape((num_actions, ))(x)
actor = Model(inputs=observation, outputs=actor_output)

However, this is what happens when I try to run the file:

Training for 100000 steps ...
Traceback (most recent call last):
  File "A2C_cartpole.py", line 47, in <module>
    agent.fit(env, nb_steps=100000, verbose=2, callbacks=[tb])
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\rl\core.py", line 177, in fit
    observation, r, done, info = env.step(action)
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\gym\wrappers\time_limit.py", line 31, in step
    observation, reward, done, info = self.env.step(action)
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\gym\envs\classic_control\cartpole.py", line 92, in step
    assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
AssertionError: array([-0.00089871, -0.00465946], dtype=float32) (<class 'numpy.ndarray'>) invalid

From what I understand, the error is occurring since the Actor is outputting floating points, and the env expects a discrete number when calling env.step(action).

Does anybody know a workaround? Currently, I'm thinking of modifying ddpg.py and change select_action() to loop through the output and output either a 0 or a 1 for the cartpole.
Please advise.

Thanks and Best Regards.

The text was updated successfully, but these errors were encountered:

Psyf · 2019-01-14T06:30:10Z

Update:

I've tried to implement my idea and inserted this line just before return action in ddpg.py#selection_action(): action = np.argmax(action)

The error I'm faced with is triggered after nb_steps_warmup_action steps (500 in my case).

Traceback (most recent call last):
  File "A2C_cartpole.py", line 47, in <module>
    agent.fit(env, nb_steps=100000, verbose=2, callbacks=[tb])
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\rl\core.py", line 194, in fit
    metrics = self.backward(reward, terminal=done)
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\rl\agents\ddpg.py", line 273, in backward
    assert action_batch.shape == (self.batch_size, self.nb_actions)
AssertionError

action_batch.shape is (32, ) and the right hand side is (32, 2).
Seems to be a rabbit hole.
Seems to be similar to #62 as well.

WHui829129 · 2020-05-22T09:07:14Z

Hi.
I wanted to train an DQN on the Minist dataset for image classification, how to build my environment?
Thank you!

zh3389 · 2020-07-01T08:05:11Z

在源码文件路径中?\site-packages\rl\core.py的177行添加action = np.argmax(action)即可正常训练离散动作的代理,如果要训练连续动作的代理.则需要注释掉action = np.argmax(action)

for _ in range(action_repetition):  
     callbacks.on_action_begin(action) 
     # 此处添加 action = np.argmax(action)
     action = np.argmax(action)  
     observation, r, done, info = env.step(action)  
     observation = deepcopy(observation)

测试的时候会出现同样的错误, 同样在大概 300 多行左右. 也就是报错的位置添加action = np.argmax(action)即可

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDPG on Discrete Action Space #285

DDPG on Discrete Action Space #285

Psyf commented Jan 14, 2019 •

edited

Loading

Psyf commented Jan 14, 2019 •

edited

Loading

WHui829129 commented May 22, 2020

zh3389 commented Jul 1, 2020 •

edited

Loading

DDPG on Discrete Action Space #285

DDPG on Discrete Action Space #285

Comments

Psyf commented Jan 14, 2019 • edited Loading

Psyf commented Jan 14, 2019 • edited Loading

WHui829129 commented May 22, 2020

zh3389 commented Jul 1, 2020 • edited Loading

Psyf commented Jan 14, 2019 •

edited

Loading

Psyf commented Jan 14, 2019 •

edited

Loading

zh3389 commented Jul 1, 2020 •

edited

Loading