Skip to content

AtariWrapper does not use recommended defaults #635

Open
@RyanNavillus

Description

@RyanNavillus

The current AtariWrapper by default has terminate_on_life_loss set to True. This goes against the recommendations of Revisiting the Arcade Learning Environment (https://arxiv.org/pdf/1709.06009.pdf). I believe this should be set to False by default. They also recommend using sticky actions instead of noop resets, but I think that problem is outside the scope of this wrapper.

Activity

Miffyli

Miffyli commented on Oct 27, 2021

@Miffyli
Collaborator

That is set to True by default to follow the original implementation of baselines which had terminate_on_life_loss enabled here. While I agree it would be better to not have used it (to reflect the "real" end of the game as originally intended by the game developers, and also pick the easier option of two), it would cause major hidden changes in results if it were changed at this point. @araffin ?

RyanNavillus

RyanNavillus commented on Oct 27, 2021

@RyanNavillus
Author

I just realized that I should have put this issue in the actual stable baselines 3 repo, but I guess it's relevant here as well. I definitely understand the trade-off between using newer recommendations and preserving fair comparisons to previous work.

jkterry1

jkterry1 commented on Oct 27, 2021

@jkterry1
Contributor

My concern, and I'm sure @JesseFarebro (the maintainer of the ALE) would agree, is that the settings in Gym environments for Atari was never really done to begin with and that for people doing future work with then should use what have been the recommended practices with then for years. This actually caused an issue with us working with Atari games for an ICML paper, which is why Ryan created the issue.

Miffyli

Miffyli commented on Oct 27, 2021

@Miffyli
Collaborator

Right, I totally agree with the point :). We could consider changing the default setting in zoo and SB3, and leave a big warning there to indicate of this change. It would be bad if a popular library would hinder the progression just by sticking to "old stuff" for the weak reason of "that's what has been done before".

araffin

araffin commented on Oct 28, 2021

@araffin
Member

Hello,

This goes against the recommendations of Revisiting the Arcade Learning Environment (https://arxiv.org/pdf/1709.06009.pdf).

Yes, I'm aware of that.
We kept it to be able to compare results against SB2.
Looking at the actual paragraph (see below) and my personal experience, this does not affect much results (your experience is probably different, otherwise there would not be an issue).

But we should at least update the doc and put a warning there with the recommended settings (this could be a flag in the make_atari_env deactivated at first but with a warning and then activated in a future version of SB3)

Episode termination. In the initial ALE benchmark results (Bellemare et al., 2013),
episodes terminate when the game is over. However, in some games the player has a number
of “lives” which are lost one at a time. Terminating only when the game is over often makes
it difficult for agents to learn the significance of losing a life. Mnih et al. (2015) terminated
training episodes when the agent lost a life, rather than when the game is over (evaluation
episodes still lasted for the entire game). While this approach has the potential to teach an
agent to avoid “death,” Bellemare et al. (2016b) noted that it can in fact be detrimental
to an agent’s performance. Currently, both approaches are still common in the literature.
We often see episodes terminating when the game is over (e.g., Hausknecht et al., 2014;
Liang et al., 2016; Lipovetzky et al., 2015; Martin et al., 2017), as well as when the agent
loses a life (e.g., Nair et al., 2015; Schaul et al. 2016; van Hasselt et al., 2016). Considering
the ideal of minimizing the use of game-specific information and the questionable utility of
termination using the “lives” signal, we recommend that only the game over signal be used
for termination.

araffin

araffin commented on May 1, 2022

@araffin
Member

The current AtariWrapper by default has terminate_on_life_loss set to True. This goes against the recommendations of Revisiting the Arcade Learning Environment (https://arxiv.org/pdf/1709.06009.pdf). I believe this should be set to False by default. They also recommend using sticky actions instead of noop resets, but I think that problem is outside the scope of this wrapper.

I did quick experiments on Breakout with PPO, with and without terminal on life loss activated, and the current default seems to have a good impact on performance:
https://wandb.ai/openrlbenchmark/openrlbenchmark/reports/Atari-defaults-PPO---VmlldzoxOTA4NjUz

Results are not significant at all yet, only 3 runs, but it should be investigated further.

qgallouedec

qgallouedec commented on Jan 1, 2023

@qgallouedec
Collaborator

What about sticky actions?

araffin

araffin commented on Jan 2, 2023

@araffin
Member

What about sticky actions?

you mean its influence on performance?

I don't know, I think the main issue is that the changes were made without benchmark (in the paper, comparison is only partial) and it looks like it's still missing.
@RyanNavillus @JesseFarebro @pseudo-rnd-thoughts am I wrong?

qgallouedec

qgallouedec commented on Jan 3, 2023

@qgallouedec
Collaborator

I meant, is it implemented? After digging in the code, I realize that sticky actions are enabled by default directly by ALE, but only for v0 and v5 (not for v4, which is used by sb3), see the gym documentation and the ALE source code. It seems to me that sticky actions is used by most of the works, so shouldn't we provide a StickyActionWrapper for v4?
EDIT: Or just add default repeat_action_probability=0.25 to env_kwargs in common.env_util.make_atari_env function.

Somehow related: DLR-RM/rl-baselines3-zoo#133 (comment)

pseudo-rnd-thoughts

pseudo-rnd-thoughts commented on Jan 3, 2023

@pseudo-rnd-thoughts
Contributor

I believe the primary difference between v0 and v4 is if sticky actions is enabled. This is a confusing change that ale made

https://gymnasium.farama.org/environments/atari/#version-history-and-naming-schemes

RyanNavillus

RyanNavillus commented on Jan 8, 2023

@RyanNavillus
Author

@araffin I think the goal of sticky actions was not to improve performance w.r.t reward but to produce a robust policy instead of a deterministic action sequence. They show that it training with sticky actions causes DQN to have roughly the same performance when you evaluate with sticky actions on and off, while methods designed to exploit determinism in the environment perform much worse when evaluated with stick actions on. They argue that sticky actions are better than other methods for various reasons in section 5.3.

TL;DR, sticky actions are the recommended way to prevent agents from abusing determinism, not a way to improve rewards.

RyanNavillus

RyanNavillus commented on Jan 8, 2023

@RyanNavillus
Author

There is a much weaker argument in the paper that we should not terminate_on_life_loss because that's environment-specific knowledge, so algorithms evaluated with that setting will overfit more to Atari. They also argue that terminate_on_life_loss has a debatable effect on performance, but it seems like your experiments show that it can help.

16 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AtariWrapper does not use recommended defaults · Issue #635 · DLR-RM/stable-baselines3