Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPOv2Trainer reward_model throws AttributeError: '<My Custom Class>' object has no attribute 'base_model_prefix' #1977

Open
2 of 4 tasks
RylanSchaeffer opened this issue Aug 26, 2024 · 1 comment
Labels
📚 documentation Improvements or additions to documentation 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO

Comments

@RylanSchaeffer
Copy link
Contributor

RylanSchaeffer commented Aug 26, 2024

System Info

  • transformers version: 4.44.0
  • Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
  • Python version: 3.11.9
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.32.1
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: FSDP
    - mixed_precision: bf16
    - use_cpu: False
    - debug: True
    - num_processes: 2
    - machine_rank: 0
    - num_machines: 1
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - enable_cpu_affinity: False
    - fsdp_config: {'fsdp_activation_checkpointing': True, 'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch': 'BACKWARD_PRE', 'fsdp_cpu_ram_efficient_loading': True, 'fsdp_forward_prefetch': True, 'fsdp_offload_params': True, 'fsdp_sharding_strategy': 'FULL_SHARD', 'fsdp_state_dict_type': 'SHARDED_STATE_DICT', 'fsdp_sync_module_states': True, 'fsdp_use_orig_params': True}
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
    - dynamo_config: {'dynamo_backend': 'EAGER'}
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA A100-SXM4-80GB

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

Note that In PPOv2Trainer, the type annotation for reward_model is nn.Module: https://github.com/huggingface/trl/blob/main/trl/trainer/ppov2_trainer.py#L77

However, when I pass in nn.Module object (a class StrInputRewardModelEnsemble I created myself which inherits from nn.Module), I receive the error:

AttributeError: 'StrInputRewardModelEnsemble' object has no attribute 'base_model_prefix'

The error occurs here: https://github.com/huggingface/trl/blob/main/trl/trainer/ppov2_trainer.py#L58

Expected behavior

I think PPOv2Trainer either needs:

  1. Better documentation and/or
  2. better type annotations

to specify what exactly is expected for the reward_model

@RylanSchaeffer RylanSchaeffer added the 🐛 bug Something isn't working label Aug 26, 2024
@qgallouedec qgallouedec added 📚 documentation Improvements or additions to documentation 🧒 good second issue Good for contributors with basic project familiarity 🏋 PPO Related to PPO and removed 🐛 bug Something isn't working labels Oct 20, 2024
@qgallouedec qgallouedec added the 🙋 help from community wanted Open invitation for community members to contribute label Dec 14, 2024
@haimianxing
Copy link

I also encountered this error when I try to use RLootTrainer by using AutoModelForCausalLMWithValueHead.

[rank3]: Traceback (most recent call last):
[rank3]: File "/mnt/data2/zcz/infer/utils/./accelerate_torch.py", line 262, in
[rank3]: trainer.train()
[rank3]: File "/mnt/data2/zcz/.miniconda3_14/envs/_torch_env/lib/python3.9/site-packages/trl/trainer/rloo_trainer.py", line 352, in train
[rank3]: _, score, _ = get_reward(
[rank3]: File "/mnt/data2/zcz/.miniconda3_14/envs/_torch_env/lib/python3.9/site-packages/trl/trainer/utils.py", line 1128, in get_reward
[rank3]: lm_backbone = getattr(model, model.base_model_prefix)
[rank3]: File "/mnt/data2/zcz/.miniconda3_14/envs/_torch_env/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 517, in getattr
[rank3]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank3]: AttributeError: 'DeepSpeedEngine' object has no attribute 'base_model_prefix'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 documentation Improvements or additions to documentation 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO
Projects
None yet
Development

No branches or pull requests

3 participants