16 May 2026 · 5 min read · Operations

AI Just Broke the Cyber Benchmarks That Keep Your Business Safe: What the AISI Report Means for UK Firms in 2026

The UK’s AI Security Institute reports that Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 have shattered every benchmark for autonomous cyber capability, completing complex multi-stage attacks in months instead of years. For business owners, this means the gap between AI-driven offence and defence is compressing to hours — and your 90-day patch cycle is now a liability.

On Wednesday, the UK’s AI Security Institute (AISI) published findings that should jolt every CTO, CISO and board member out of their seats: Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 have broken every trend line the institute has ever tracked for autonomous cyber capability.

We are no longer talking about AI helping developers write safer code. We are talking about frontier models completing 32-step simulated corporate network attacks, end-to-end, with no human in the loop.

Mythos Preview solved “The Last Ones” — a complex multi-stage attack against a small enterprise network — in 6 out of 10 attempts. It cracked “Cooling Tower,” a range no model had ever solved before, in 3 out of 10 attempts. GPT-5.5 solved “The Last Ones” in 3 out of 10. The AISI’s own doubling trend, already accelerating, has been left in the dust. The institute’s official assessment is blunt: “The length of cyber tasks that frontier models can complete autonomously has doubled on the order of months, not years.”

Why This Is a Business Story, Not a Geek Story

Palo Alto Networks, a launch partner for Anthropic’s Project Glasswing, independently verified the jump. In its own testing, the company found the latest models “extraordinarily capable at finding vulnerabilities and changing them into critical exploit paths in near-real-time.” The fallout? Palo Alto issued 26 CVEs covering 75 vulnerabilities across more than 130 products — compared to a typical monthly volume of fewer than five. Every SaaS product had already been patched by the time the advisory went live.

That last sentence is worth re-reading. The same class of AI that can autonomously attack a network is now being used, at scale, to find and fix holes before attackers reach them. The gap between offence and defence is compressing to hours, not weeks.

For UK and European businesses, this is a fork in the road. The EU AI Act’s risk-based framework already classifies certain AI systems as “high risk,” and the UK’s AISI is actively pre-deploying evaluations on behalf of the British government. But regulation is not moving at the speed of these models. If your security posture assumes a 90-day patch cycle and human-led penetration testing, your assumptions are now obsolete.

The Three Things You Should Do Before Monday

1. Assume your red team is now AI-powered — on both sides.

If you run penetration tests annually, or even quarterly, you are auditing a moving target with a snapshot camera. The AISI data suggests capability doubling every four to five months. That means a model that could not bypass your MFA in January may succeed in June. Shift from periodic audits to continuous, automated vulnerability scanning that integrates directly into CI/CD pipelines. Tools like Snyk, GitHub Advanced Security, and now AI-driven scanners such as those being trialled under Project Glasswing should be default, not optional.

2. Demand AI-readiness from your security vendors.

Palo Alto Networks moved fast because it had a direct pipeline into Anthropic’s frontier model. Most mid-market businesses do not. When you next review your SOC or MDR provider, ask two questions: Are you using AI-native detection to shrink mean-time-to-detect? And are you participating in vendor early-access programmes for frontier security models? If the answer to both is no, you are paying yesterday’s prices for tomorrow’s threats.

3. Map your AI Act exposure now, not next year.

The EU AI Act’s high-risk system obligations include risk management, data governance, transparency and human oversight. Autonomous cyber-capable models sitting inside your stack — whether through direct API use or via a vendor — could trigger those obligations sooner than you expect. Document where frontier models touch your customer data, your supply chain, or your critical infrastructure. The regulators will not care that you “did not know” the model was that powerful.

What the AISI Did Not Say

The institute was careful to hedge. It noted that estimates are based on a relatively small number of models and that the hardest tasks have the least human comparison data. Drop any single model from the analysis and the doubling estimate shifts by less than a month. In other words, this is not a one-off miracle. It is a consistent, compounding acceleration.

That is the part that keeps me up at night. We are not looking at a single breakthrough. We are looking at the steep end of an exponential curve. The METR nonprofit, which independently tracks how quickly AI handles software tasks, arrived at almost the same four-month doubling figure. When two separate measurement teams agree, the noise is not the story — the signal is.

The Bottom Line

Claude Mythos Preview is not publicly available. Anthropic gated it behind a security research programme because the company believes the model is too capable to release widely. OpenAI’s GPT-5.5-Cyber is similarly restricted. That should tell you everything: the companies building these tools no longer trust the open internet to handle them responsibly.

Your business does not need access to frontier models to be affected by them. You need to be prepared for a world in which the attackers and the defenders are both AI agents, moving faster than any human security team can react. The organisations that survive this transition will be the ones that stop treating AI as a productivity tool and start treating it as a strategic security layer — embedded, governed, and continuously validated.

And if you want to explore what this means for your specific business — where the vulnerabilities are, how your vendor stack stacks up, and what a defensible AI security posture looks like in practice — you can book time with me directly.

Discover more from Callum Knox

Subscribe to get the latest posts sent to your email.

Ready to implement this?

Every article I write is backed by systems I have actually built. If you want the same results without doing it yourself, let me build it for you.

Discuss Your Project

Tags: Agentic AI, AI Automation, AI Security, Anthropic, Cybersecurity, UK Business

AI Just Broke the Cyber Benchmarks That Keep Your Business Safe: What the AISI Report Means for UK Firms in 2026

Why This Is a Business Story, Not a Geek Story

The Three Things You Should Do Before Monday

What the AISI Did Not Say

The Bottom Line

Like this:

Related

Discover more from Callum Knox

Ready to implement this?

Why This Is a Business Story, Not a Geek Story

The Three Things You Should Do Before Monday

What the AISI Did Not Say

The Bottom Line

Share this:

Like this:

Related

Discover more from Callum Knox

Ready to implement this?

Read Next

Agentic AI 2026: Enterprise Adoption Rates & Cost Models

Amazon will show AI product images when you search for

OpenAI’s ‘Super App’ News: How an AI Employee Will Run Your Office in 2026

Discover more from Callum Knox