Blog
Blog
some thoughts from the team
Engineering
Evaluating AI Agents in Security Operations (December 2025)
Which frontier model should you use for SecOps automation? We added the latest cohort of frontier models to our benchmark to find out.
Dec 1, 2025
Engineering
Evaluating AI Agents in Security Operations
We benchmarked frontier AI models on realistic security operations (SecOps) tasks using Cotool’s agent harness and the Splunk BOTSv3 dataset. GPT-5 achieved the highest accuracy (63%), while Claude Haiku-4.5 completed tasks the fastest with strong accuracy. GPT-5 variants dominated the performance-cost frontier. These results provide practical guidance for model selection in enterprise SecOps automation.
Nov 17, 2025



