Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OREO into TRL and HF #2525

Open
3 tasks done
August-murr opened this issue Dec 28, 2024 · 0 comments
Open
3 tasks done

Integrate OREO into TRL and HF #2525

August-murr opened this issue Dec 28, 2024 · 0 comments
Assignees
Labels
✨ enhancement New feature or request

Comments

@August-murr
Copy link
Collaborator

Method description

Offline Reinforcement Learning for LLM Multi-Step Reasoning introduces the OREO algorithm, a way to improve how large language models (LLMs) handle multi-step reasoning using offline reinforcement learning. It helps assign credit better and reduces the need for pairwise data.

I’d like to integrate OREO into Hugging Face and TRL so it can work with tools like PEFT and quantization. The goal is to make it easier for people to use, along with its test-time compute method.

@jwhj has already shared the code here: OREO repo.
The current implementation is a great start, but bringing it into Hugging Face’s ecosystem would make it more user-friendly and widely usable.

Would love to hear your thoughts on whether this would be a helpful addition!

Open source status

  • The method implementation is available
  • The model weights are available
  • The training datasets are available

Provide useful links for the implementation

@jwhj

@August-murr August-murr added the ✨ enhancement New feature or request label Dec 28, 2024
@August-murr August-murr self-assigned this Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant