Skip to content

TheAgentCompany/TheAgentCompany

Repository files navigation

Logo

The Agent Company: Benchmarking LLM Agents on Consequential Real World Tasks

Build License

Please refer to the website for more details.

Overview

TODO, paste paper content here

Set Up

Check out the docs for more details.

Exciting Features

  • Diverse task roles:
    • Software Engineer
    • Product Manager
    • Data Scientist
    • Human Resource
    • Financial Staff
    • Administrator
  • Diverse data types:
    • Coding tasks
    • Conversational tasks
    • Mathematical reasoning
    • Image processing
    • Text comprehension
  • Multiple Agent Interaction
  • Comprehensive scoring system
    • Result-based evaluation (primary)
    • Subcheckpoints checking (secondary)
  • Multiple evaluation methods:
    • Deterministic evaluators
    • LLM-based evaluators
  • Simple one-command operations:
    • Complete environment setup in minutes
    • Quick system reset in minutes when needed
  • Extensible benchmark framework
    • Add new tasks/evaluators/subcheckpoints in minutes

Contribution

Currently, we are not accepting task contributions for first version benchmark. But we welcome any contributions to bug fixes, documentation, and other improvements. Questions? Please create an issue. Otherwise, you can also contact Frank F. Xu, Yufan Song, Boxuan Li (Email: fangzhex@cs.cmu.edu, yufans@alumni.cmu.edu, boxuanli@alumni.cmu.edu)

Cite

TODO

License

Distributed under the MIT License. See LICENSE for more information.