You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Offline Reinforcement Learning for LLM Multi-Step Reasoning introduces the OREO algorithm, a way to improve how large language models (LLMs) handle multi-step reasoning using offline reinforcement learning. It helps assign credit better and reduces the need for pairwise data.
I’d like to integrate OREO into Hugging Face and TRL so it can work with tools like PEFT and quantization. The goal is to make it easier for people to use, along with its test-time compute method.
@jwhj has already shared the code here: OREO repo.
The current implementation is a great start, but bringing it into Hugging Face’s ecosystem would make it more user-friendly and widely usable.
Would love to hear your thoughts on whether this would be a helpful addition!
Method description
Offline Reinforcement Learning for LLM Multi-Step Reasoning introduces the OREO algorithm, a way to improve how large language models (LLMs) handle multi-step reasoning using offline reinforcement learning. It helps assign credit better and reduces the need for pairwise data.
I’d like to integrate OREO into Hugging Face and TRL so it can work with tools like PEFT and quantization. The goal is to make it easier for people to use, along with its test-time compute method.
@jwhj has already shared the code here: OREO repo.
The current implementation is a great start, but bringing it into Hugging Face’s ecosystem would make it more user-friendly and widely usable.
Would love to hear your thoughts on whether this would be a helpful addition!
Open source status
Provide useful links for the implementation
@jwhj
The text was updated successfully, but these errors were encountered: