Vibe Skill
Purpose: Is this code ready to ship?
Three steps:
- Complexity analysis β Find hotspots (radon, gocyclo)
- Bug hunt audit β Systematic sweep for concrete bugs
- Council validation β Multi-model judgment
Quick Start
/vibe
/vibe recent
/vibe src/auth/
/vibe --quick recent
/vibe --structured recent
/vibe --deep recent
/vibe --sweep recent
/vibe --mixed recent
/vibe --preset=security-audit src/auth/
/vibe --explorers=2 recent
/vibe --debate recent
/vibe --tier=quality recent
Execution Steps
Step 0: Load Prior Review Context
Before reviewing, pull relevant learnings from prior code reviews and known patterns:
if command -v ao &>/dev/null; then
ao lookup --query "<target-scope> code review patterns" --limit 3 2>/dev/null || true
fi
Apply retrieved knowledge (mandatory when results returned):
If learnings or patterns are returned, do NOT just load them as passive context. For each returned item:
- Check: does this learning apply to the code under review? (answer yes/no)
- If yes: include it as a
known_risk in your review β state the pattern, what to look for, and whether the code exhibits it
- Cite the learning by filename in your review output when it influences a finding
After applying, record the citation:
ao metrics cite "<learning-path>" --type applied 2>/dev/null || true
Skip silently if ao is unavailable or returns no results.
Project reviewer config: If .agents/reviewer-config.md exists, its full config (reviewers, plan_reviewers, skip_reviewers) is passed to council for judge selection. See skills/council/SKILL.md Step 1b.
Crank Checkpoint Detection
Before scanning for changed files via git diff, check if a crank checkpoint exists:
if [ -f .agents/vibe-context/latest-crank-wave.json ]; then
echo "Crank checkpoint found β using files_changed from checkpoint"
FILES_CHANGED=$(jq -r '.files_changed[]' .agents/vibe-context/latest-crank-wave.json 2>/dev/null)
WAVE_COUNT=$(jq -r '.wave' .agents/vibe-context/latest-crank-wave.json 2>/dev/null)
echo "Wave $WAVE_COUNT checkpoint: $(echo "$FILES_CHANGED" | wc -l | tr -d ' ') files changed"
fi
When a crank checkpoint is available, use its files_changed list instead of re-detecting via git diff. This ensures vibe validates exactly the files that crank modified.
Step 1: Determine Target
If target provided: Use it directly.
If no target or "recent": Auto-detect from git:
git diff --name-only HEAD~3 2>/dev/null | head -20
If nothing found, ask user.
Pre-flight: If no files found:
Return immediately with: "PASS (no changes to review) β no modified files detected."
Do NOT spawn agents for empty file lists.
Step 1.5a: Structured Verification Path (--structured mode)
If --structured flag is set, run a 6-phase mechanical verification pipeline instead of the council flow. This produces a machine-readable verification report suitable for PR gates and CI integration.
Phases: Build β Types β Lint β Tests β Security β Diff Review.
Read references/verification-report.md for the full report template and per-phase commands. Each phase is fail-fast β if Build fails, skip remaining phases and report NOT READY.
After all phases complete, write the structured report to .agents/council/YYYY-MM-DD-verification-<target>.md and output the summary table to the user.
When to use: Pre-PR gate, CI integration, when you need a mechanical pass/fail rather than judgment-based review.
Step 1.5: Fast Path (--quick mode)
If --quick flag is set, skip Steps 2a through 2e as heavy pre-processing, plus 2.5 and 2f, and jump to Step 4 with inline council after Steps 2.3, 2.4, 2g, and Step 3. Domain checklists, compiled-prevention loading, test-pyramid inventory, and inline product context are cheap and high-value, so they still run in quick mode. Complexity analysis (Step 2) still runs β it's cheap and informative.
Why: Steps 2.5 and 2aβ2f add 30β90 seconds of pre-processing that mainly feed multi-judge council packets. In --quick mode (single inline agent), those inputs are not worth the cost, but test-pyramid and product-context checks still shape the inline review meaningfully.
Step 2: Run Complexity Analysis
Detect language and run appropriate tool:
For Python:
mkdir -p .agents/council
echo "$(date -Iseconds) preflight: checking radon" >> .agents/council/preflight.log
if ! which radon >> .agents/council/preflight.log 2>&1; then
echo "β οΈ COMPLEXITY SKIPPED: radon not installed (pip install radon)"
else
radon cc <path> -a -s 2>/dev/null | head -30
radon mi <path> -s 2>/dev/null | head -30
fi
For Go:
echo "$(date -Iseconds) preflight: checking gocyclo" >> .agents/council/preflight.log
if ! which gocyclo >> .agents/council/preflight.log 2>&1; then
echo "β οΈ COMPLEXITY SKIPPED: gocyclo not installed (go install github.com/fzipp/gocyclo/cmd/gocyclo@latest)"
else
gocyclo -over 10 <path> 2>/dev/null | head -30
fi
For other languages: Skip complexity with explicit note: "β οΈ COMPLEXITY SKIPPED: No analyzer for "
Interpret results:
| Score |
Rating |
Action |
| A (1-5) |
Simple |
Good |
| B (6-10) |
Moderate |
OK |
| C (11-20) |
Complex |
Flag for council |
| D (21-30) |
Very complex |
Recommend refactor |
| F (31+) |
Untestable |
Must refactor |
Include complexity findings in council context.
Step 2.3: Load Domain-Specific Checklists
Detect code patterns in the target files and load matching domain-specific checklists from standards/references/:
| Trigger |
Checklist |
Detection |
| SQL/ORM code |
sql-safety-checklist.md |
Files contain SQL queries, ORM imports (database/sql, sqlalchemy, prisma, activerecord, gorm, knex), or migration files in changeset |
| LLM/AI code |
llm-trust-boundary-checklist.md |
Files import anthropic, openai, google.generativeai, or match *llm*, *prompt*, *completion* patterns |
| Concurrent code |
race-condition-checklist.md |
Files use goroutines, threading, asyncio, multiprocessing, sync.Mutex, concurrent.futures, or shared file I/O patterns |
| Codex skills |
codex-skill.md |
Files under skills-codex/, or files matching *codex*SKILL.md, convert.sh, skills-codex-overrides/, or converter scripts |
For each matched checklist, load it via the Read tool and include relevant items in the council packet as context.domain_checklists. Multiple checklists can be loaded simultaneously.
Skip silently if no patterns match. This step runs in both --quick and full modes (domain checklists are cheap to load and high-value).
Steps 2.4-2f, 2h, 3-3.6 (Deep Checks & Pre-Council Prep): Read references/deep-checks.md for compiled prevention, prior findings, pre-council deep analysis checks, product context, spec loading, suppressions, pre-mortem correlation, and model cost tiers. Loaded automatically unless --quick mode is set. In --quick mode, skip directly to Step 2g.
Compiled prevention inputs: Load .agents/pre-mortem-checks/ and .agents/planning-rules/ when available. These compiled artifacts contain known_risks from prior findings that inform the review β carry matched finding IDs into council context so judges can assess whether the flywheel prevented rediscovery.
Step 2a: Prior Findings Check
Skip if --quick. Load prior findings from .agents/findings/registry.jsonl.
Step 2b: Constraint Tests
Skip if --quick. Run compiled constraint tests from .agents/constraints/.
Step 2c: Metadata Checks
Skip if --quick. Verify file metadata consistency.
Step 2.5: OL Validation
Skip if --quick. Run organizational-lint checks.
Step 2d: Knowledge Search
Skip if --quick. Search for relevant prior learnings via ao lookup.
Step 2e: Bug Hunt
Skip if --quick. Run proactive bug-hunt audit on target files.
Step 2f: Codex Review
Skip if --quick. When --mixed is passed and Codex CLI is available, send the first 2000 chars of the diff to Codex for a parallel review. Cap input at 2000 chars to stay within Codex context budgets.
Step 3: Product Context
Skip if --quick as a separate judge-fanout step. When PRODUCT.md exists and the user did not pass an explicit --preset override, quick mode still loads DX expectations inline in the single-agent review. In non-quick modes, add a DX (developer experience) judge: 2 independent + 1 DX judge (3 judges total). The DX judge evaluates whether the code aligns with the product's stated personas and value propositions.
Step 2g: Test Pyramid Inventory (MANDATORY)
Assess test coverage against the test pyramid standard (the test pyramid standard (loaded via /standards)).
Read skills/vibe/references/test-pyramid-weighting.md for test pyramid weighting β L3+ tests found all production bugs, weight them 5x.
Test Pyramid Weighting: Weight test coverage by level: L0βL1 at 1x, L2 at 3x, L3+ at 5x. Unit-only coverage is a WARN signal, not a PASS. See references/test-pyramid-weighting.md.
Run even in --quick mode β this is cheap (file existence checks) and high-signal.
-
Identify changed modules from git diff or target scope
-
For each changed module, check coverage pyramid (L0βL3):
- L0: Does a contract/spec enforcement test cover this module?
- L1: Does a unit test file exist for this module?
- L2: If module crosses boundaries, does an integration test exist?
-
For boundary-touching code, check bug-finding pyramid (BF1βBF5):
- BF4 (Chaos): Do external call sites have failure injection tests?
- BF1 (Property): Do data transformations have property tests?
- BF2 (Golden): Do output generators have golden file tests?
-
Compute weighted pyramid score for changed code paths:
Formula:
weighted_score = (L0_count x 1 + L1_count x 1 + L2_count x 3 + L3_count x 5 + L4_count x 5) / max_possible
Where max_possible = total_test_count x 5 (the score if every test were L3+).
Count tests at each level for changed code paths:
- L0: Build/compile checks (weight 1)
- L1: Unit tests (weight 1)
- L2: Integration tests (weight 3)
- L3: E2E/system tests (weight 5)
- L4: Smoke/fresh-context tests (weight 5)
Interpretation:
weighted_score >= 0.6 β strong pyramid, L2+ tests present
0.3 <= weighted_score < 0.6 β acceptable, but recommend more integration tests
weighted_score < 0.3 AND all tests are L0-L1 only β WARN: unit-only test coverage (feeds into vibe verdict as a WARN signal, not a separate gate)
Satisfaction exposure: The weighted_score is also exposed as satisfaction_score (with source "test-pyramid-weighted") in the test_pyramid output block. Downstream consumers (e.g., /validation STEP 1.8 holdout evaluation) can use satisfaction_score as a normalized quality signal.
Include in council packet and vibe report output:
## Test Pyramid Score
| Level | Count | Weight | Contribution |
|-------|-------|--------|--------------|
| L0 | 2 | 1x | 2 |
| L1 | 8 | 1x | 8 |
| L2 | 0 | 3x | 0 |
| L3 | 0 | 5x | 0 |
| L4 | 0 | 5x | 0 |
| **Total** | **10** | | **10 / 50 = 0.20** |
WARN: weighted_score 0.20 < 0.3 and all tests a