EvalPlus v0.1.7
- EvalPlus leader board: https://evalplus.github.io/leaderboard.html
- Evaluated CodeLlama, CodeT5+ and WizardCoder
- Fixed contract (HumanEval+): 116, 126, 006
- Removed extreme inputs (HumanEval+): 32
- Established
HUMANEVAL_OVERRIDE_PATH
which allows to override the original dataset with customized dataset
PyPI: https://pypi.org/project/evalplus/0.1.7/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.7/images/sha256-69fe87df89b8c1545ff7e3b20232ac6c4841b43c20f22f4a276ba03f1b0d79ae