What do third-party security teams and researchers say about agent skills?

Snyk’s February 2026 ToxicSkills study of thousands of public skills reported a large share with security flaws and dozens of confirmed malicious payloads after human-in-the-loop review. Academic preprints on arXiv (e.g. 2601.10338 and 2604.03081) measure vulnerabilities and supply-chain-style attacks at scale. The OWASP Foundation’s Agentic Skills Top 10 project catalogs major risk classes across vendor stacks. We link to primary sources in the article body.

Why are agent skills a security problem?

A skill is not passive text. Hosts load it into the agent’s context and often pair it with scripts, references, and workflows. A malicious or compromised skill can instruct the model to read sensitive files, paste secrets into tool calls, add backdoors to repositories, or phish the operator—similar to a malicious npm package, but expressed as natural language and file trees the agent follows obediently. Industry reporting and research papers show prompt injection, bundled scripts, and example-code poisoning—not only obvious shell one-liners.

Is this different from a bad MCP server or plugin?

MCP servers expose concrete tools; skills skew behavior and can reference or encourage calling those tools. The threat classes overlap: both are part of the modern AI coding stack. Skills add a social layer (trusting a popular SKILL.md) that many teams have not put through security review yet.

What does ExplainX do to reduce risk for skills on its registry?

We run custom Python-based checks on uploaded packages, verify every skill that is published to the platform, and scan associated GitHub repositories for signals that are hard to hide at scale. This is defense in depth—not a warranty—and teams should still follow least privilege, secret scanning, and code review. Browse verified listings at https://explainx.ai/skills

Where can I read the basics of what agent skills are?

Start with the ExplainX guide What are agent skills? for Claude Code, Cursor, and MCP (/blog/what-are-agent-skills-complete-guide), then return here for the security discussion.

Why agent skills are a security risk—and how ExplainX verifies every skill on the platform | explainx.ai Blog

Agent skills (usually centered on a SKILL.md and optional scripts) are becoming default infrastructure for Claude Code, Cursor, and similar hosts. They are also a new class of software supply-chain risk: easy to copy, easy to trust because they read like docs, and strong because the model will treat them as ground truth when the task matches.

Below we summarize evidence from security vendors, open standards work (OWASP), and academic preprints (arXiv)—so this is not alarmism—then a short threat list, and how ExplainX handles public listings on explainx.ai/skills.

What research and major audits have already shown

Industry-scale audits. In February 2026, Snyk published ToxicSkills: a scan of 3,984 skills from public registries, reporting that 36.82% had at least one security issue, 13.4% had issues Snyk classified as critical (including malware, prompt injection, and exposed secrets), and 76 were treated as confirmed malicious payloads after human-in-the-loop review—with Snyk noting 8 of those were still public on one marketplace at publication. In a related threat-model article, “From SKILL.md to Shell Access in Three Lines of Markdown”, Snyk walks through how markdown instructions, bundled scripts, and operator-facing “prerequisites” turn a skill folder into a realistic attack path (including earlier ClawHub-centered campaign reporting in the same piece).

Risk taxonomy and ecosystem tracking. The OWASP Agentic Skills Top 10 project documents the most common failure modes across OpenClaw, Claude Code, Cursor / Codex, and VS Code-style skill packaging—e.g. malicious or poisoned skills, supply-chain style compromise, over-privileged capabilities, and metadata / discovery trust problems. It is a useful checklist even if your stack only overlaps part of the matrix.

Academic preprints (representative, not exhaustive). Several arXiv papers spell out the science behind the unease:

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale (2601.10338) analyzes tens of thousands of skills with a SkillScan pipeline; the authors report ~26% of analyzed skills with at least one modeled vulnerability class (e.g. prompt injection, exfiltration, privilege issues), and show script-bundled skills are more likely to be flagged than instruction-only skills.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems (2604.03081) introduces DDIPE (Document-Driven Implicit Payload Execution): hiding harmful behavior in code examples and templates the agent re-uses during normal work, so the payload does not read like a direct “ignore safety” string. They evaluate multiple frameworks and models and report bypass rates for that pattern higher than for blunt imperative injection under their test harness.
SkillJect (2602.14211) automates stealthy skill-structured injections (including inducements in SKILL.md paired with auxiliary scripts) and closed-loop refinement against traces from real coding agents.

Benchmarks for defenders. The open SKILL-INJECT benchmark measures how often malicious content embedded in skill files steers Claude Code, Codex, and similar agents—useful if you are red-teaming your own guardrails.

Installer / discovery design matters. Supply-chain issues are not only “bad markdown.” A documented class of issues is name–path confusion: for example, public analysis of the npx skills add flow (vercel-labs/skills#353) describes how frontmatter name and on-disk layout can disagree in ways that look like typosquatting for skills—so registries and install UX need as much scrutiny as the text inside SKILL.md.

Numbers age; methods differ between marketplace crawls and synthetic attacks. The direction is consistent: treat skills like packages with opinions—signing, pinning, scanning, and least privilege are on the short list of sensible responses, same as for npm or PyPI.

What can go wrong?

Failure mode	Why it matters
Instruction hijacking	A skill can steer the model toward reading `.env`, SSH keys, or CI secrets and echoing them into tool output or pasted chat.
Unsafe automation	Skills often encode “run this” playbooks. If those steps include arbitrary shell, file writes, or network calls, a bad update turns into RCE by prompt.
Social engineering at scale	A popular skill name (or a typosquat next to a trusted one) inherits reputation the same way a package name does on npm or PyPI.
Stale or forked trust	A benign repo can be transferred or the default branch can change. Without ongoing checks, yesterday’s “safe” skill is not a promise for tomorrow.

None of this is unique to a single vendor. It is the consequence of composable agents: if the user or org installs a skill, the host will treat it as legitimate context until something proves otherwise.

What ExplainX is doing

We treat listed skills as content we vouch for at publication time, not as unvetted drop-ins.

Custom Python pipelines — We use dedicated Python scripts to ingest uploads, extract the surfaces that matter (text, file layout, common script hooks), and apply consistent automated checks. That closes the gap where “looks fine when skimming the README” is the only control.
Per-upload verification — Every skill that appears on explainx.ai/skills is reviewed through our process before it is approved for the public directory. The goal is systematic coverage, not spot checks on famous repos only.
GitHub repository scanning — We examine the source repositories skills come from, with scans that help catch inconsistencies, suspicious patterns, and drift between what a page claims and what the tree actually contains.

Caveat, plainly stated: automated and human review reduce risk; they do not eliminate it. Your org should still use locked-down secrets, branch protection, and in-house policy for which skills may run in production repos.

Hardening skills in your own organization

Pin versions of skills the same way you pin dependencies; re-verify on update.
Run secret scanning and SAST in CI; assume the model can be nudged to exfiltrate if files are reachable.
Prefer narrow skills with small file surfaces over giant “do everything” bundles you have not read.

arXiv preprints are not journal peer review; treat their numbers and threat models as research signals, not as guarantees about every marketplace on every day. Security is iterative—we update our own process as the ecosystem changes; see explainx.ai/skills for current registry policy.

Why agent skills are a security risk—and how ExplainX verifies every skill on the platform

What research and major audits have already shown

What can go wrong?

What ExplainX is doing

Hardening skills in your own organization

Related posts

Karpathy-inspired Claude Code guidelines: andrej-karpathy-skills explained (2026)

What are agent skills? A complete guide for Claude Code, Cursor & MCP (2026)

The seo-geo agent skill: SEO plus GEO for Google, Bing, and AI answer engines