The Agent Company: Benchmarking LLM Agents on Consequential Real World Tasks

Please refer to the website for more details.

Overview

TODO, paste paper content here

Set Up

Check out the docs for more details.

Exciting Features

Diverse task roles:
- Software Engineer
- Product Manager
- Data Scientist
- Human Resource
- Financial Staff
- Administrator
Diverse data types:
- Coding tasks
- Conversational tasks
- Mathematical reasoning
- Image processing
- Text comprehension
Multiple Agent Interaction
Comprehensive scoring system
- Result-based evaluation (primary)
- Subcheckpoints checking (secondary)
Multiple evaluation methods:
- Deterministic evaluators
- LLM-based evaluators
Simple one-command operations:
- Complete environment setup in minutes
- Quick system reset in minutes when needed
Extensible benchmark framework
- Add new tasks/evaluators/subcheckpoints in minutes

Contribution

Currently, we are not accepting task contributions for first version benchmark. But we welcome any contributions to bug fixes, documentation, and other improvements. Questions? Please create an issue. Otherwise, you can also contact Frank F. Xu, Yufan Song, Boxuan Li (Email: fangzhex@cs.cmu.edu, yufans@alumni.cmu.edu, boxuanli@alumni.cmu.edu)

Cite

TODO

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 671 Commits
.github		.github
docs		docs
evaluation		evaluation
servers		servers
workspaces		workspaces
.gitignore		.gitignore
.openhands_instruction		.openhands_instruction
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Agent Company: Benchmarking LLM Agents on Consequential Real World Tasks

Overview

Set Up

Exciting Features

Contribution

Cite

License

About

Releases 1

Packages

Contributors 23

Languages

License

TheAgentCompany/TheAgentCompany

Folders and files

Latest commit

History

Repository files navigation

The Agent Company: Benchmarking LLM Agents on Consequential Real World Tasks

Overview

Set Up

Exciting Features

Contribution

Cite

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 23

Languages

Packages