forked from sotopia-lab/sotopia
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Sotopia Benchmark CLI API (sotopia-lab#69)
* add benchmark social agents * add benchmark agents * Add sotopia_benchmark cli api * fix pre-commit * add evaluator model argument * finish benchmarking * benchmark done * chore: Fix formatting issue in redis_stats.ipynb and cli.py * switch back to LLM_Name * merge main * add together ai * fix naming error * roll back to llama2 * chore: Update langchain-together dependency to version 0.1.2 * use chatopenai for together models * add logging * fix pre-commit * add more logging options * probably fix the event loop closed error: following NVIDIA/NeMo-Guardrails#336 * modify cli; fix model position bug * chore: Update benchmark tag to "benchmark_{model}_final" * Refactor _iterate_all_env_agent_combo_not_in_db function * chore: Update python version to 3.11.2 * change to dict comparison * ignore jsonl * ✨ finish benchmarking script * chore: Refactor server.py and redis_stats.ipynb * add type ignore * push for the eval * Refactor run_async_benchmark_in_batch function * Refactor run_async_benchmark_in_batch function * add doc * precommit fix * pre-commit * refactor * update w feedback * pre commit * chore: Update authors in pyproject.toml and fetch benchmark_agents.json from Hugging Face API * hotfix * chore: Remove unnecessary type hint in benchmark/cli.py --------- Co-authored-by: Hao <prokilchu@gmail.com>
- Loading branch information
Showing
16 changed files
with
866 additions
and
476 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,7 +137,7 @@ data/* | |
deprecated/* | ||
|
||
*.csv | ||
|
||
*.jsonl | ||
#backup | ||
backup/* | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Benchmark your model as a social agent in Sotopia | ||
|
||
``` | ||
sotopia_benchmark --model=<your_model_name> | ||
``` | ||
or | ||
|
||
``` | ||
python sotopia/benchmark/cli.py --model=<your_model_name> | ||
``` | ||
Currently this script would run over 100 simulations on the Sotopia Hard tasks. And the partner model is fixed to be `meta-llama/Llama-3-70b-chat-hf` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.