Game Dev Improvement

skill-test

Donchitos/Claude-Code-Game-Studios · updated Apr 16, 2026

$npx skills add https://github.com/Donchitos/Claude-Code-Game-Studios --skill skill-test
summary

### Skill Test

  • description: "Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report)."
  • argument-hint: "static [skill-name | all] | spec [skill-name] | category [skill-name | all] | audit"
  • allowed-tools: Read, Glob, Grep, Write
skill.md

Skill Test

Validates .claude/skills/*/SKILL.md files for structural compliance and behavioral correctness. No external dependencies — runs entirely within the existing skill/hook/template architecture.

Four modes:

ModeCommandPurposeToken Cost
static/skill-test static [name|all]Structural linter — 7 compliance checks per skillLow (~1k/skill)
spec/skill-test spec [name]Behavioral verifier — evaluates assertions in test specMedium (~5k/skill)
category/skill-test category [name|all]Category rubric — checks skill against its category-specific metricsLow (~2k/skill)
audit/skill-test auditCoverage report — skills, agent specs, last test datesLow (~3k total)

Phase 1: Parse Arguments

Determine mode from the first argument:

  • static [name] → run 7 structural checks on one skill
  • static all → run 7 structural checks on all skills (Glob .claude/skills/*/SKILL.md)
  • spec [name] → read skill + test spec, evaluate assertions
  • category [name] → run category-specific rubric from CCGS Skill Testing Framework/quality-rubric.md
  • category all → run category rubric for every skill that has a category: in catalog
  • audit (or no argument) → read catalog, list all skills and agents, show coverage

If argument is missing or unrecognized, output usage and stop.


Phase 2A: Static Mode — Structural Linter

For each skill being tested, read its SKILL.md fully and run all 7 checks:

Check 1 — Required Frontmatter Fields

The file must contain all of these in the YAML frontmatter block:

  • name:
  • description:
  • argument-hint:
  • user-invocable:
  • allowed-tools:

FAIL if any are absent.

Check 2 — Multiple Phases

The skill must have ≥2 numbered phase headings. Look for patterns like:

  • ## Phase N or ## Phase N:
  • ## N. (numbered top-level sections)
  • At least 2 distinct ## headings if phases aren't explicitly numbered

FAIL if fewer than 2 phase-like headings are found.

Check 3 — Verdict Keywords

The skill must contain at least one of: PASS, FAIL, CONCERNS, APPROVED, BLOCKED, COMPLETE, READY, COMPLIANT, NON-COMPLIANT

FAIL if none are present.

Check 4 — Collaborative Protocol Language

The skill must contain ask-before-write language. Look for:

  • "May I write" (canonical form)
  • "before writing" or "approval" near file-write instructions
  • "ask" + "write" in close proximity (within same section)

WARN if absent (some read-only skills legitimately skip this). FAIL if allowed-tools includes Write or Edit but no ask-before-write language is found.

Check 5 — Next-Step Handoff

The skill must end with a recommended next action or follow-up path. Look for:

  • A final section mentioning another skill (e.g., /story-done, /gate-check)
  • "Recommended next" or "next step" phrasing
  • A "Follow-Up" or "After this" section

WARN if absent.

Check 6 — Fork Context Complexity

If frontmatter contains context: fork, the skill should have ≥5 phase headings (## level or numbered Phase N headers). Fork context is for complex multi-phase skills; simple skills should not use it.

WARN if context: fork is set but fewer than 5 phases found.

Check 7 — Argument Hint Plausibility

argument-hint must be non-empty. If the skill body mentions multiple modes (e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the hint against the first phase's "Parse Arguments" section.

WARN if hint is "" or if documented modes don't match hint.


Static Mode Output Format

For a single skill:

=== Skill Static Check: /[name] ===

Check 1 — Frontmatter Fields:    PASS
Check 2 — Multiple Phases:       PASS (7 phases found)
Check 3 — Verdict Keywords:      PASS (PASS, FAIL, CONCERNS)
Check 4 — Collaborative Protocol: PASS ("May I write" found)
Check 5 — Next-Step Handoff:     WARN (no follow-up section found)
Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
Check 7 — Argument Hint:         PASS

Verdict: WARNINGS (1 warning, 0 failures)
Recommended: Add a "Follow-Up Actions" section at the end of the skill.

For static all, produce a summary table then list any non-compliant skills:

=== Skill Static Check: All 52 Skills ===

Skill                  | Result       | Issues
-----------------------|--------------|-------
gate-check             | COMPLIANT    |
design-review          | COMPLIANT    |
story-readiness        | WARNINGS     | Check 5: no handoff
...

Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
Aggregate Verdict: N WARNINGS / N FAILURES

Phase 2B: Spec Mode — Behavioral Verifier

Step 1 — Locate Files

Find skill at .claude/skills/[name]/SKILL.md. Look up the spec path from CCGS Skill Testing Framework/catalog.yaml — use the spec: field for the matching skill entry.

If either is missing:

  • Missing skill: "Skill '[name]' not found in .claude/skills/."
  • Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
  • Spec file not found at path: "Spec file missing at [path]. Run /skill-test audit to see coverage gaps."

Step 2 — Read Both Files

Read the skill file and test spec file completely.

Step 3 — Evaluate Assertions

For each Test Case in the spec:

  1. Read the Fixture description (assumed state of project files)
  2. Read the Expected behavior steps
  3. Read each Assertion checkbox

For each assertion, evaluate whether the skill's written instructions, if followed correctly given the fixture state, would satisfy it. This is a Claude-evaluated reasoning check, not code execution.

Mark each assertion:

  • PASS — skill instructions clearly satisfy this assertion
  • PARTIAL — skill instructions partially address it, but with ambiguity
  • FAIL — skill instructions would NOT satisfy this assertion given the fixture

For Protocol Compliance assertions (always present):

  • Check whether the skill requires "May I write" before file writes
  • Check whether the skill presents findings before requesting approval
  • Check whether the skill ends with a recommended next step
  • Check whether the skill avoids auto-creating files without approval

Step 4 — Build Report

=== Skill Spec Test: /[name] ===
Date: [date]
Spec: CCGS Skill Testing Framework/skills/[category]/[name].md

Case 1: [Happy Path — name]
  Fixture: [summary]
  Assertions:
    [PASS] [assertion text]
    [FAIL] [assertion text]
       Reason: The skill's Phase 3 says "..." but the fixture state means "..."
  Case Verdict: FAIL

Case 2: [Edge Case — name]
  ...
  Case Verdict: PASS

Protocol Compliance:
  [PASS] Uses "May I write" before file writes
  [PASS] Presents findings before asking approval
  [WARN] No explicit next-step handoff at end

Overall Verdict: FAIL (1 case failed, 1 warning)

Step 5 — Offer to Write Results

"May I write these results to CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md and update CCGS Skill Testing Framework/catalog.yaml?"

If yes:

  • Write results file to CCGS Skill Testing Framework/results/
  • Update the skill's entry in CCGS Skill Testing Framework/catalog.yaml:
    • last_spec: [date]
    • last_spec_result: PASS|PARTIAL|FAIL

Phase 2D: Category Mode — Rubric Evaluation

Step 1 — Locate Skill and Category

Find skill at .claude/skills/[name]/SKILL.md. Look up category: field in CCGS Skill Testing Framework/catalog.yaml.

If skill not found: "Skill '[name]' not found." If no category: field: "No category assigned for '[name]' in catalog.yaml. Add category: [name] to the skill entry first."

For category all: collect all skills with a category: field and process each. category: utility skills are evaluated against U1 (static checks pass) and U2 (gate mode correct if applicable) only — skip to the static mode for U1.

Step 2 — Read Rubric Section

Read CCGS Skill Testing Framework/quality-rubric.md. Extract the section matching the skill's category (e.g., ### gate, ### team).

Step 3 — Read Skill

Read the skill's SKILL.md fully.

Step 4 — Evaluate Rubric Metrics

For each metric in the category's rubric table:

  1. Check whether the skill's written instructions clearly satisfy the criterion
  2. Mark PASS, FAIL, or WARN
  3. For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section or note its absence)

Step 5 — Output Report

=== Skill Category Check: /[name] ([category]) ===

Metric G1 — Review mode read:      PASS
Metric G2 — Full mode directors:   FAIL
  Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent
Metric G3 — Lean mode: PHASE-GATE only: PASS
Metric G4 — Solo mode: no directors:    PASS
Metric G5 — No auto-advance:       PASS

Verdict: FAIL (1 failure, 0 warnings)
Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director
     panel in Phase 3.

Step 6 — Offer to Update Catalog

"May I update CCGS Skill Testing Framework/catalog.yaml to record this category check (last_category, last_category_result) for [name]?"


Phase 2C: Audit Mode — Coverage Report

Step 1 — Read Catalog

Read CCGS Skill Testing Framework/catalog.yaml. If missing, note that catalog doesn't exist yet (first-run state).

Step 2 — Enumerate All Skills and Agents

Glob .claude/skills/*/SKILL.md to get the complete list of skills. Extract skill name from each path (directory name).

Also read the agents: section from CCGS Skill Testing Framework/catalog.yaml to get the complete list of agents.

Step 3 — Build Skill Coverage Table

For each skill:

  • Check if a spec file exists (use the spec: path from catalog, or glob CCGS Skill Testing Framework/skills/*/[name].md)
  • Look up last_static, last_static_result, last_spec, last_spec_result, last_category, last_category_result, category from catalog (or mark as "never" / "—" if not in catalog)
  • Priority comes from catalog priority: field (critical/high/medium/low)

Step 3b — Build Agent Coverage Table

For each agent in catalog's agents: section:

  • Check if a spec file exists (use the spec: path from catalog, or glob CCGS Skill Testing Framework/agents/*/[name].md)
  • Look up last_spec, last_spec_result, category from catalog

Step 4 — Output Report

=== Skill Test Coverage Audit ===
Date: [date]

SKILLS (72 total)
Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72

Skill                  | Cat      | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority
-----------------------|----------|----------|-------------|----------|----------|----------|----------
gate-check             | gate     | YES      | never       | —        | never    | —        | critical
design-review          | review   | YES      | never       | —        | never    | —        | critical
...

AGENTS (49 total)
Agent specs written: 49 (100%)

Agent                  | Category   | Has Spec | Last Spec   | Result
-----------------------|------------|----------|-------------|--------
creative-director      | director   | YES      | never       | —
technical-director     | director   | YES      | never       | —
...

Top 5 Priority Gaps (skills with no spec, critical/high priority):
(none if all specs are written)

Skill coverage:  72/72 specs (100%)
Agent coverage:  49/49 specs (100%)

No file writes in audit mode.

Offer: "Would you like to run /skill-test static all to check structural compliance across all skills? /skill-test category all to run category rubric checks? Or /skill-test spec [name] to run a specific behavioral test?"


Phase 3: Recommended Next Steps

After any mode completes, offer contextual follow-up:

  • After static [name]: "Run /skill-test spec [name] to validate behavioral correctness if a test spec exists."
  • After static all with failures: "Address NON-COMPLIANT skills first. Run /skill-test static [name] individually for detailed remediation guidance."
  • After spec [name] PASS: "Update CCGS Skill Testing Framework/catalog.yaml to record this pass date. Consider running /skill-test audit to find the next spec gap."
  • After spec [name] FAIL: "Review the failing assertions and update the skill or the test spec to resolve the mismatch."
  • After audit: "Start with the critical-priority gaps. Use the spec template at CCGS Skill Testing Framework/templates/skill-test-spec.md to create new specs."
general reviews

Ratings

4.853 reviews
  • Ren Ndlovu· Dec 20, 2024

    We added skill-test from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Noor Shah· Dec 20, 2024

    skill-test reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Anika Patel· Dec 12, 2024

    Registry listing for skill-test matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Ren Dixit· Nov 15, 2024

    Solid pick for teams standardizing on skills: skill-test is focused, and the summary matches what you get after install.

  • Sophia Lopez· Nov 11, 2024

    skill-test is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Rahul Santra· Nov 3, 2024

    We added skill-test from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Anaya Park· Nov 3, 2024

    Keeps context tight: skill-test is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Pratham Ware· Oct 22, 2024

    skill-test fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • William Anderson· Oct 22, 2024

    skill-test is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Ren Kapoor· Oct 6, 2024

    skill-test has been reliable in day-to-day use. Documentation quality is above average for community skills.

showing 1-10 of 53

1 / 6