AI Benchmark Dashboard

Compare the performance of the latest AI models across various benchmarks

Aider Polyglot
Fiction.LiveBench
VPCT
GPQA Diamond
Frontier Math
Math Level 5
OTIS Mock AIME
SWE Bench Verified
WeirdML
Balrog
Factorio
GeoBench
SimpleBench

그래프 설정

View Type
Graph
Table
Group Color Setting
Group By
Country
Organization

Aider Polyglot

The Aider Polyglot benchmark is a comprehensive and challenging benchmark designed to evaluate the real-world coding capabilities of LLMs and AI coding agents.

It includes problems in six major programming languages: C++, Go, Java, JavaScript, Python, and Rust, selecting the most difficult 225 coding problems from the Exercism platform to evaluate not just simple code generation, but the problem-solving and code integration skills needed in real development environments.

Official results can be found at Aider Polyglot.

Data Source and License

Citation: Epoch AI, 'AI Benchmarking Hub'. Published online at epoch.ai. Retrieved from https://epoch.ai/data/ai-benchmarking-dashboard [online resource]. Accessed 24 Jun 2025.

License: This data is provided under the CC BY license.