benchmarking
All news with tag "benchmarking"
AI
Anthropic's Opus 4.6 Achieves 45% Success Rate on Professional Legal Tasks, Challenging AI Agent Limitations
Recent developments in AI agent capabilities have demonstrated significant progress in handling complex professional tas...
06.02.2026
27