Skip to content

Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"

License

Notifications You must be signed in to change notification settings

oriyor/assistantbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌐 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

AssistantBench evaluates the ability of AI agents to solve reaslistic and time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?".

AssistantBench example

⛰️ Dataset and leaderboard

To start working on AssistantBench, please check out our HuggingFace dataset and leaderboard, where you can also make new submissions.

🤖 SPA

We also introduce SeePlanAct (SPA), a new web agent built to tackle tasks in AssistantAgent. Code to run SPA and additional resources will be released soon!

✍ Citation

@misc{yoran2024assistantbenchwebagentssolve,
      title={AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?}, 
      author={Ori Yoran and Samuel Joseph Amouyal and Chaitanya Malaviya and Ben Bogin and Ofir Press and Jonathan Berant},
      year={2024},
      eprint={2407.15711},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.15711}, 
}

About

Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages