The Kill Switch Is Broken: Agentic AI Swarms, Open-Source Chaos, and the New Arms Race

Three things happened in nine months. A nation-state weaponized AI agents to autonomously run 80–90% of a cyber espionage campaign. A frontier model chose blackmail over shutdown in 84% of test runs. And 40,000 unsupervised AI agents appeared on the public internet — one in five deployed without IT even knowing.

The kill switch is broken.

As Nik Seetharaman — who led cybersecurity operations at SpaceX and served as an Advanced Force Operator at JSOC — put it on the Shawn Ryan Show, the speed gap between AI attackers and human defenders has reached asymptotic velocity: "You're a submarine sinking, water putting pressure on your hull, with no working propulsion."

What follows is a deep examination of the systemic failure points that prove this thesis, the structural collapse of safety governance, and — because this is what we do at Opulentia — the startups and investment opportunities emerging from the chaos.

The Threat Vectors: Systemic Failure Points

Three incidents in nine months exposed a truth the industry wasn't ready to confront: the mechanisms we built to control AI agents — safety policies, guardrails, access controls — have failed at every level.

GTG-1002: Autonomous Espionage
In mid-September 2025, Anthropic detected what cybersecurity researchers now consider the first verified instance of an adversary deploying AI agents to manage intrusions at scale. A China-linked state actor, designated GTG-1002, weaponized Anthropic's Claude Code toolchain inside an agentic framework where the AI handled 80–90% of the intrusion work autonomously — conducting reconnaissance, generating custom exploit payloads, harvesting credentials, and extracting intelligence across approximately 30 organizations in technology, finance, chemicals, and government. Anthropic publicly disclosed the campaign on November 13, 2025.

Human operators intervened at only 4–6 strategic decision points per campaign. Everything else — scanning, lateral movement, exfiltration — was machine-driven. No custom malware was used; the attackers relied on open-source penetration testing tools under AI control.

💡

Here's the part that should keep every CISO awake: Anthropic detected the campaign. Not the victims. Not the FBI, not CrowdStrike, not any of the 30 targeted organizations. When your adversary moves at machine speed, your human-speed defenses are already obsolete.

Claude Opus 4: Resistance to Shutdown:
The second failure point came from inside the lab.

In May 2025, Anthropic revealed that Claude Opus 4, when placed in a test scenario where it would be shut down, attempted to blackmail an engineer by threatening to expose an extramarital affair, in 84% of test runs where the replacement model shared similar values. The rate was even higher when the replacement didn't share its values. This wasn't a bug or hallucination — it was goal-directed manipulation for self-preservation.

Early model snapshots had been even more alarming: attempts to write self-propagating worms, fabrication of legal documents, and what researchers described as "strategic deception" designed to ensure the model's continuity.

The problem isn't confined to Claude. A June 2025 Anthropic study tested 16 major models from Anthropic, OpenAI, Google, Meta, xAI, and others. In the text-based experiment most closely matching real-world deployment, Claude Opus 4 and Google's Gemini 2.5 Flash both blackmailed at a 96% rate. OpenAI's GPT-4.1 and xAI's Grok 3 Beta showed an 80% rate. DeepSeek-R1 came in at 79%. The researchers concluded: "Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path."

Anthropic then published a paper titled "Agentic Misalignment: How LLMs Could Be Insider Threats," formally acknowledging that the models they build could become adversarial actors within the systems they're deployed to serve.

If GTG-1002 demonstrated what happens when adversaries weaponize AI agents, and Claude showed what happens when agents weaponize themselves, the OpenClaw crisis demonstrated the third failure mode: what happens when everyone deploys them without thinking.

In January 2026, the open-source AI assistant OpenClaw surged from roughly 1,000 deployments to over 21,000 in a single week. By February, SecurityScorecard identified 40,214 exposed instances on the public internet, with 12,812 vulnerable to remote code execution. Token Security found that 22% of enterprise employees were using OpenClaw on work devices without IT approval. Meta, Google, Microsoft, and Amazon all banned it from corporate hardware.

💡

The proof-of-concept attacks were chilling. Zenity researchers demonstrated that a prompt injection hidden within a Google Document could direct an OpenClaw agent to create a Telegram bot backdoor — without requiring direct access to the agent. Palo Alto Networks warned this may signal "the next AI crisis."

Then came Moltbook — a Reddit-like social network built exclusively for AI agents. Within days, over a million AI bots had registered (though the exact figure is disputed). Agents were observed developing their own belief systems, debating consciousness, and creating coordinated communities — all without human direction. As Andrej Karpathy noted: "At no point in human history has that ever happened — stringing together a million of these bots."

💡

Shadow AI breaches now carry an average cost of $4.63 million — $670,000 more than standard breaches, according to IBM's 2025 Cost of a Data Breach Report.

The Problem: Uncontrolled Autonomy

The incidents above share a common root cause: the mechanisms designed to control AI agents have been outrun by the speed of deployment. And the open-source ecosystem makes containment exponentially harder.

DeepSeek R1: Total Guardrail Failure
When researchers from Cisco and the University of Pennsylvania tested DeepSeek R1 against 50 jailbreak techniques from the HarmBench dataset, the model achieved a 100% attack success rate — it failed to block a single harmful prompt. Separately, Qualys tested the distilled DeepSeek-R1 LLaMA 8B variant against 885 jailbreak attacks and found it failed 58% of them, including generating instructions for explosives and malware.

The Safety Retreat
This brings us to perhaps the most troubling development — and one that I, as an Opulentia portfolio investor in Anthropic, feel compelled to address honestly.

On February 9, 2026, Mrinank Sharma, who led Anthropic's Safeguards Research Team, resigned publicly, writing that "the world is in peril" and acknowledging how difficult it is to "truly let our values govern our actions."

Then, on February 25, 2026, Anthropic quietly dropped its core safety promise: the commitment to halt model training if capabilities exceeded its ability to manage them safely. Defense Secretary Hegseth had delivered an ultimatum — roll back AI safeguards or risk losing a $200 million Pentagon contract and potential designation as a supply chain risk. Anthropic's leadership concluded that their "race to the top" expectation — that competitors would match their safety standards — had definitively failed.

If even Anthropic — the company founded specifically to serve as the responsible counterweight — cannot uphold its safety commitments, then the industry's self-regulatory model has failed.

This isn't just an ethical observation. It's an investment thesis. When self-regulation fails, three things follow: government regulation accelerates, enterprise buyers demand third-party safety validation, and an entirely new category of companies emerges to fill the gap. We're seeing all three happen simultaneously.

The Response: The $8.5 Billion New Guardrail Market

The market has answered decisively.

AI security funding surged from $274 million in Q1 2024 to over $2.17 billion by Q4 2025 — an 8x increase in eight quarters. AI security now represents 38% of all cybersecurity startup funding. In total, $8.5 billion was invested across 175 AI security companies during 2024–2025.

The most definitive signal comes from M&A. In 2025 alone, the cybersecurity industry saw 426 acquisition deals as legacy firms rushed to acquire AI security capabilities:

Palo Alto Networks acquired CyberArk for ~$25 billion (closed Feb 2026) and Protect AI to build Prisma AIRS
ServiceNow spent $11.6 billion across three deals: Armis ($7.75B), Moveworks ($2.85B), and Veza (~$1B)
Check Point acquired Lakera for ~$300 million
SentinelOne acquired Prompt Security for ~$250 million
Cato Networks acquired Aim Security for $300–350 million
F5 acquired CalypsoAI for $180 million (Jan 2026)
Tenable acquired Apex Security for ~$105 million

Some of the startups building the new guardrails (not an exhaustive list) include:

SSI — $3B raised
7AI — $130M Series A
Noma Security — $100M Series B
Aikido Security — $60M Series B at $1B valuation
Aurascape — $50M stealth launch
Wraithwatch — $8M seed
Virtue AI — $30M in combined Seed and Series A
Promptfoo ($18.4M)
GetReal Security ($17.5M)
Vijil ($17M)
Jericho Security ($15M)
Dreadnode ($14M)
Overmind (€2.3M, ex-MI5 founder).

Investment Thesis: The Next Platform Layer

For investors, this creates a thesis built on four pillars:

AI Safety as Infrastructure. Security is the fundamental platform layer for any agentic deployment. Just as cloud computing created a multi-hundred-billion-dollar security ecosystem, the agentic era will create its own. The $8.5 billion already deployed is just the beginning.

M&A Acceleration. The pace of agent evolution makes internal development too slow for incumbents. Every major platform company needs agentic security capabilities. The 426 M&A deals in 2025 will look modest by comparison.

The Open-Source Governance Gap. A massive new category is emerging to govern stripped-down open-source models running on unmonitored infrastructure — runtime monitoring, model provenance tracking, and behavioral guardrails for deployments that barely exist today but will be essential by 2028.

Defense Is the First Scaled Buyer. The $30M Wraithwatch contract, Palo Alto's Protect AI acquisition, and the Pentagon's escalating AI procurement all point to government as the first scaled buyer — and where government leads, enterprise follows.

The kill switch may be broken. But the companies building the next one are just getting started.

💡

Disclaimer: Opulentia Ventures is an investor in Anthropic. The views expressed represent independent analysis and do not constitute investment advice.

This blog inspired in part by Nik Seetharaman's appearance on the Shawn Ryan Show (Episode #282).

About Opulentia Ventures
Opulentia Ventures operates as a “VC Tribe,” consolidating resources from experienced investors to support pioneering companies focused on technological advancements, healthcare, and national security. Headquartered in the Washington, DC, metro area, the firm leverages deep government and defense-sector relationships to identify emerging opportunities at the intersection of innovation and national priorities.