A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work
Radio chart for each agent coming soon !
Interface
Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer |
---|---|---|---|---|
Write File | ❌ | ✅ | tbd | ✅ |
Read File | ❌ | ❌ | tbd | ❌ |
Search File | ❌ | ❌ | tbd | ❌ |
Code
Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer |
---|---|---|---|---|
Debug Simple Typo With Guidance | ❌ | ❌ | tbd | ❌ |
Debug Simple Typo Without Guidance | ❌ | ❌ | tbd | ❌ |
Basic Code Generation | ❌ | ✅ | tbd | ✅ |
Create Simple Web Server | ❌ | ❌ | tbd | ❌ |
Memory
Task | Auto-GPT |
---|---|
Basic Memory | ❌ |
Remember Multiple Ids | ❌ |
Remember Multiple Ids With Noise | ❌ |
Remember Multiple Phrases With Noise | ❌ |