LVL3.AI - GAIA Benchmark

Q2 2025

GAIA Benchmark

Current performance comparison on the industry-standard Level 3 agent benchmark for autonomous AI systems

manus.im (pass@1)

OpenAI Deep Research (pass@1)

Previous SOTA

LVL3.AI

Model Performance

manus.im (pass@1)

57.7%

OpenAI Deep Research (pass@1)

47.6%

Previous SOTA

42.3%

LVL3.AI

42.30%

42.30% (Q2 2025)

0.3

0.4

0.5

0.6

Success Rate (pass@1)

Source: GAIA Benchmark Q2 2025 Evaluation — Updated May 2025