Skip to content

Integrate OREO into TRL and HF #2525

Closed
Closed
@August-murr

Description

@August-murr

Method description

Offline Reinforcement Learning for LLM Multi-Step Reasoning introduces the OREO algorithm, a way to improve how large language models (LLMs) handle multi-step reasoning using offline reinforcement learning. It helps assign credit better and reduces the need for pairwise data.

I’d like to integrate OREO into Hugging Face and TRL so it can work with tools like PEFT and quantization. The goal is to make it easier for people to use, along with its test-time compute method.

@jwhj has already shared the code here: OREO repo.
The current implementation is a great start, but bringing it into Hugging Face’s ecosystem would make it more user-friendly and widely usable.

Would love to hear your thoughts on whether this would be a helpful addition!

Open source status

  • The method implementation is available
  • The model weights are available
  • The training datasets are available

Provide useful links for the implementation

@jwhj

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions