Total 0 posts

AI Benchmark Dashboard

Compare the performance of the latest AI models across various benchmarks

Aider Polyglot

Fiction.LiveBench

VPCT

GPQA Diamond

Frontier Math

Math Level 5

OTIS Mock AIME

SWE Bench Verified

WeirdML

Balrog

Factorio

GeoBench

SimpleBench

그래프 설정

View Type

Graph

Table

Group Color Setting

Group By

Country

Organization

Aider Polyglot

The Aider Polyglot benchmark is a comprehensive and challenging benchmark designed to evaluate the real-world coding capabilities of LLMs and AI coding agents.

It includes problems in six major programming languages: C++, Go, Java, JavaScript, Python, and Rust, selecting the most difficult 225 coding problems from the Exercism platform to evaluate not just simple code generation, but the problem-solving and code integration skills needed in real development environments.

Official results can be found at Aider Polyglot.

Data Source and License

Citation: Epoch AI, 'AI Benchmarking Hub'. Published online at epoch.ai. Retrieved from https://epoch.ai/data/ai-benchmarking-dashboard [online resource]. Accessed 9 Aug 2025.

License: This data is provided under the CC BY license.