Q2 2025
GAIA Benchmark
Current performance comparison on the industry-standard Level 3 agent benchmark for autonomous AI systems
manus.im (pass@1)
OpenAI Deep Research (pass@1)
Previous SOTA
LVL3.AI
Model Performance
manus.im (pass@1)
57.7%
57.7%
OpenAI Deep Research (pass@1)
47.6%
47.6%
Previous SOTA
42.3%
42.3%
LVL3.AI
42.30%
42.30% (Q2 2025)
0.3
0.4
0.5
0.6
Success Rate (pass@1)
Source: GAIA Benchmark Q2 2025 Evaluation — Updated May 2025