Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPG on Discrete Action Space #285

Open
Psyf opened this issue Jan 14, 2019 · 3 comments
Open

DDPG on Discrete Action Space #285

Psyf opened this issue Jan 14, 2019 · 3 comments

Comments

@Psyf
Copy link

Psyf commented Jan 14, 2019

Hi.

I wanted to train an Actor-Critic on the Cartpole Environment (new Deel RL student here :) )
I know that sounds overkill, I just wanted to try.

This is my actor:

observation = Input(shape=(1, ) + env.observation_space.shape)
x = Dense(16, activation='relu')(observation)
x = Dense(16, activation='relu')(x)
x = Dense(16, activation='relu')(x)
x = Dense(num_actions, activation='linear')(x)	
actor_output = Reshape((num_actions, ))(x)
actor = Model(inputs=observation, outputs=actor_output)

However, this is what happens when I try to run the file:

Training for 100000 steps ...
Traceback (most recent call last):
  File "A2C_cartpole.py", line 47, in <module>
    agent.fit(env, nb_steps=100000, verbose=2, callbacks=[tb])
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\rl\core.py", line 177, in fit
    observation, r, done, info = env.step(action)
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\gym\wrappers\time_limit.py", line 31, in step
    observation, reward, done, info = self.env.step(action)
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\gym\envs\classic_control\cartpole.py", line 92, in step
    assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
AssertionError: array([-0.00089871, -0.00465946], dtype=float32) (<class 'numpy.ndarray'>) invalid

From what I understand, the error is occurring since the Actor is outputting floating points, and the env expects a discrete number when calling env.step(action).

Does anybody know a workaround? Currently, I'm thinking of modifying ddpg.py and change select_action() to loop through the output and output either a 0 or a 1 for the cartpole.
Please advise.

Thanks and Best Regards.

@Psyf
Copy link
Author

Psyf commented Jan 14, 2019

Update:

I've tried to implement my idea and inserted this line just before return action in ddpg.py#selection_action(): action = np.argmax(action)

The error I'm faced with is triggered after nb_steps_warmup_action steps (500 in my case).

Traceback (most recent call last):
  File "A2C_cartpole.py", line 47, in <module>
    agent.fit(env, nb_steps=100000, verbose=2, callbacks=[tb])
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\rl\core.py", line 194, in fit
    metrics = self.backward(reward, terminal=done)
  File "C:\Users\Psyf\Anaconda3\envs\NeuralStyleTransfer\lib\site-packages\rl\agents\ddpg.py", line 273, in backward
    assert action_batch.shape == (self.batch_size, self.nb_actions)
AssertionError

action_batch.shape is (32, ) and the right hand side is (32, 2).
Seems to be a rabbit hole.
Seems to be similar to #62 as well.

@WHui829129
Copy link

Hi.
I wanted to train an DQN on the Minist dataset for image classification, how to build my environment?
Thank you!

@zh3389
Copy link

zh3389 commented Jul 1, 2020

在源码文件路径中?\site-packages\rl\core.py177行添加action = np.argmax(action)即可正常训练离散动作的代理,如果要训练连续动作的代理.则需要注释掉action = np.argmax(action)

for _ in range(action_repetition):  
     callbacks.on_action_begin(action) 
     # 此处添加 action = np.argmax(action)
     action = np.argmax(action)  
     observation, r, done, info = env.step(action)  
     observation = deepcopy(observation)

测试的时候会出现同样的错误, 同样在大概 300 多行左右. 也就是报错的位置添加action = np.argmax(action)即可

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants