← Blog
explainx / blog

Why agent skills are a security risk—and how ExplainX verifies every skill on the platform

Independent audits (Snyk ToxicSkills), academic preprints (arXiv on supply-chain poisoning, large-scale skill scans, SkillJect), and OWASP’s Agentic Skills Top 10 show agent skills are a real software supply chain. Here is that evidence in short, plus how ExplainX verifies listings at explainx.ai/skills with Python pipelines, per-upload review, and GitHub scanning.

6 min readExplainX Team
Agent SkillsSecuritySupply chainClaude CodeExplainX

Includes frontmatter plus an attribution block so copies credit explainx.ai and the canonical URL.

Why agent skills are a security risk—and how ExplainX verifies every skill on the platform

Agent skills (usually centered on a SKILL.md and optional scripts) are becoming default infrastructure for Claude Code, Cursor, and similar hosts. They are also a new class of software supply-chain risk: easy to copy, easy to trust because they read like docs, and strong because the model will treat them as ground truth when the task matches.

Below we summarize evidence from security vendors, open standards work (OWASP), and academic preprints (arXiv)—so this is not alarmism—then a short threat list, and how ExplainX handles public listings on explainx.ai/skills.


What research and major audits have already shown

Industry-scale audits. In February 2026, Snyk published ToxicSkills: a scan of 3,984 skills from public registries, reporting that 36.82% had at least one security issue, 13.4% had issues Snyk classified as critical (including malware, prompt injection, and exposed secrets), and 76 were treated as confirmed malicious payloads after human-in-the-loop review—with Snyk noting 8 of those were still public on one marketplace at publication. In a related threat-model article, “From SKILL.md to Shell Access in Three Lines of Markdown”, Snyk walks through how markdown instructions, bundled scripts, and operator-facing “prerequisites” turn a skill folder into a realistic attack path (including earlier ClawHub-centered campaign reporting in the same piece).

Risk taxonomy and ecosystem tracking. The OWASP Agentic Skills Top 10 project documents the most common failure modes across OpenClaw, Claude Code, Cursor / Codex, and VS Code-style skill packaging—e.g. malicious or poisoned skills, supply-chain style compromise, over-privileged capabilities, and metadata / discovery trust problems. It is a useful checklist even if your stack only overlaps part of the matrix.

Academic preprints (representative, not exhaustive). Several arXiv papers spell out the science behind the unease:

  • Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale (2601.10338) analyzes tens of thousands of skills with a SkillScan pipeline; the authors report ~26% of analyzed skills with at least one modeled vulnerability class (e.g. prompt injection, exfiltration, privilege issues), and show script-bundled skills are more likely to be flagged than instruction-only skills.
  • Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems (2604.03081) introduces DDIPE (Document-Driven Implicit Payload Execution): hiding harmful behavior in code examples and templates the agent re-uses during normal work, so the payload does not read like a direct “ignore safety” string. They evaluate multiple frameworks and models and report bypass rates for that pattern higher than for blunt imperative injection under their test harness.
  • SkillJect (2602.14211) automates stealthy skill-structured injections (including inducements in SKILL.md paired with auxiliary scripts) and closed-loop refinement against traces from real coding agents.

Benchmarks for defenders. The open SKILL-INJECT benchmark measures how often malicious content embedded in skill files steers Claude Code, Codex, and similar agents—useful if you are red-teaming your own guardrails.

Installer / discovery design matters. Supply-chain issues are not only “bad markdown.” A documented class of issues is name–path confusion: for example, public analysis of the npx skills add flow (vercel-labs/skills#353) describes how frontmatter name and on-disk layout can disagree in ways that look like typosquatting for skills—so registries and install UX need as much scrutiny as the text inside SKILL.md.

Numbers age; methods differ between marketplace crawls and synthetic attacks. The direction is consistent: treat skills like packages with opinionssigning, pinning, scanning, and least privilege are on the short list of sensible responses, same as for npm or PyPI.


What can go wrong?

Failure modeWhy it matters
Instruction hijackingA skill can steer the model toward reading .env, SSH keys, or CI secrets and echoing them into tool output or pasted chat.
Unsafe automationSkills often encode “run this” playbooks. If those steps include arbitrary shell, file writes, or network calls, a bad update turns into RCE by prompt.
Social engineering at scaleA popular skill name (or a typosquat next to a trusted one) inherits reputation the same way a package name does on npm or PyPI.
Stale or forked trustA benign repo can be transferred or the default branch can change. Without ongoing checks, yesterday’s “safe” skill is not a promise for tomorrow.

None of this is unique to a single vendor. It is the consequence of composable agents: if the user or org installs a skill, the host will treat it as legitimate context until something proves otherwise.


What ExplainX is doing

We treat listed skills as content we vouch for at publication time, not as unvetted drop-ins.

  1. Custom Python pipelines — We use dedicated Python scripts to ingest uploads, extract the surfaces that matter (text, file layout, common script hooks), and apply consistent automated checks. That closes the gap where “looks fine when skimming the README” is the only control.

  2. Per-upload verificationEvery skill that appears on explainx.ai/skills is reviewed through our process before it is approved for the public directory. The goal is systematic coverage, not spot checks on famous repos only.

  3. GitHub repository scanning — We examine the source repositories skills come from, with scans that help catch inconsistencies, suspicious patterns, and drift between what a page claims and what the tree actually contains.

Caveat, plainly stated: automated and human review reduce risk; they do not eliminate it. Your org should still use locked-down secrets, branch protection, and in-house policy for which skills may run in production repos.


Hardening skills in your own organization

  • Pin versions of skills the same way you pin dependencies; re-verify on update.
  • Run secret scanning and SAST in CI; assume the model can be nudged to exfiltrate if files are reachable.
  • Prefer narrow skills with small file surfaces over giant “do everything” bundles you have not read.

Read next: What are agent skills? A complete guide · Browse verified skills · Snyk ToxicSkills write-up · OWASP Agentic Skills Top 10

arXiv preprints are not journal peer review; treat their numbers and threat models as research signals, not as guarantees about every marketplace on every day. Security is iterative—we update our own process as the ecosystem changes; see explainx.ai/skills for current registry policy.

Related posts