Anthropic's Opus 4.6 Achieves 45% Success Rate on Professional Legal Tasks, Challenging AI Agent Limitations

06.02.2026

Recent developments in AI agent capabilities have demonstrated significant progress in handling complex professional tasks, particularly in the legal domain. A new benchmark assessment reveals a substantial performance leap that challenges previous assumptions about AI's readiness for workplace deployment.

Last month, analysis of Mercor's professional task benchmark showed discouraging results, with all major AI labs scoring below 25% on tasks involving legal work and corporate analysis. These findings suggested that legal professionals faced minimal displacement risk from AI automation in the near term.

However, the landscape has shifted dramatically with Anthropic's release of Opus 4.6. The model has significantly disrupted performance rankings, achieving nearly 30% accuracy in single-attempt evaluations and an impressive 45% average success rate when permitted multiple iterations on problem-solving tasks.

The latest release incorporates advanced agentic capabilities, including agent swarm architectures that appear to enhance performance on multi-step reasoning and complex problem-solving scenarios. This architectural approach may be contributing significantly to the observed improvements in benchmark performance.

The advancement represents a substantial leap from previous state-of-the-art results. Mercor CEO Brendan Foody characterized the progress as remarkable, stating: "jumping from 18.4% to 29.8% in a few months is insane." This trajectory suggests that foundation model development continues to advance at an accelerated pace.

While the 30% benchmark performance remains considerably below human-level proficiency, the rapid improvement rate indicates that legal and professional service sectors should reassess their assumptions about AI displacement timelines. The technology gap is narrowing faster than many industry observers anticipated.

Sources:
APEX-Agents Leaderboard - Mercor

Tags: AI agents benchmarking Anthropic legal tech Foundation Models

Share: VK Telegram Twitter