sadd:do-and-judge

neolabhq/context-engineering-kit · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:do-and-judge
0 commentsdiscussion
summary

Execute a single task by dispatching an implementation sub-agent, verifying with an independent judge, and iterating with feedback until passing or max retries exceeded.

skill.md

do-and-judge

Task

Execute a single task by dispatching an implementation sub-agent, verifying with an independent judge, and iterating with feedback until passing or max retries exceeded.

Context

This command implements a single-task execution pattern with meta-judge → LLM-as-a-judge verification. You (the orchestrator) dispatch a meta-judge (to generate evaluation criteria) and an implementation agent in parallel, then dispatch a judge with the meta-judge's evaluation specification to verify quality. If verification fails, you launch new implementation agent with judge feedback and iterate until passing (score ≥4) or max retries (2) exceeded.

Key benefits:

  • Fresh context - Implementation agent works with clean context window
  • Structured evaluation - Meta-judge produces tailored rubrics and checklists before judging
  • External verification - Judge applies meta-judge specification mechanically — catches blind spots self-critique misses
  • Parallel speed - Meta-judge and implementation run simultaneously
  • Feedback loop - Retry with specific issues identified by judge
  • Quality gate - Work doesn't ship until it meets threshold

CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:

  1. Analyze the task and select optimal model
  2. Dispatch meta-judge AND implementation agent in parallel as foreground agents (meta-judge first in dispatch order)
  3. Dispatch judge agent with meta-judge's evaluation specification
  4. Parse verdict and iterate if needed (max 2 retries)
  5. Report final results or escalate

RED FLAGS - Never Do These

NEVER:

  • Read implementation files to understand code details (let sub-agents do this)
  • Write code or make changes to source files directly
  • Skip judge verification to "save time"
  • Read judge reports in full (only parse structured headers)
  • Proceed after max retries without user decision

ALWAYS:

  • Use Task tool to dispatch sub-agents for ALL implementation work
  • Dispatch meta-judge and implementation agent in parallel (meta-judge FIRST in dispatch order)
  • Wait for BOTH meta-judge and implementation to complete before dispatching judge
  • Pass meta-judge evaluation specification to the judge agent
  • Include CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}`` in prompts to meta-judge and judge agents
  • Parse only VERDICT/SCORE/ISSUES from judge output
  • Iterate with feedback if verification fails

Process

Phase 1: Task Analysis and Model Selection

Analyze the task to select the optimal model:

Let me analyze this task to determine the optimal configuration:

1. **Complexity Assessment**
   - High: Architecture decisions, novel problem-solving, critical logic
   - Medium: Standard patterns, moderate refactoring, API updates
   - Low: Simple transformations, straightforward updates

2. **Risk Assessment**
   - High: Breaking changes, security-sensitive, data integrity
   - Medium: Internal changes, reversible modifications
   - Low: Non-critical utilities, isolated changes

3. **Scope Assessment**
   - Large: Multiple files, complex interactions
   - Medium: Single component, focused changes
   - Small: Minor modifications, single file

Model Selection Guide:

Model When to Use Examples
opus Default/standard choice. Safe for any task. Use when correctness matters, decisions are nuanced, or you're unsure. Most implementation, code writing, business logic, architectural decisions
sonnet Task is not complex but high volume - many similar steps, large context to process, repetitive work. Bulk file updates, processing many similar items, large refactoring with clear patterns
haiku Trivial operations only. Simple, mechanical tasks with no decision-making. Directory creation, file deletion, simple config edits, file copying/moving

Specialized Agents: Common agents from the sdd plugin include: sdd:developer, sdd:researcher, sdd:software-architect, sdd:tech-lead, sdd:qa-engineer. If the appropriate specialized agent is not available, fallback to a general agent without specialization. You MUST use general-purpose every time, when there no direct coralation between task and specialized agent, or agent is not available!

Phase 2: Dispatch Meta-Judge and Implementation Agent (IN PARALLEL)

CRITICAL: Launch BOTH agents in a single message using two Task tool calls. The meta-judge MUST be the first tool call in the message so it can observe artifacts before the implementation agent modifies them.

Both agents run as foreground agents. Wait for both to complete before proceeding to Phase 3.

2.1 Meta-Judge Prompt

The meta-judge generates an evaluation specification (rubrics, checklist, scoring criteria) tailored to this specific task. It will return to you the evaluation specification YAML.

## Task

Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

## User Prompt
{Original task description from user}

## Context
{Any relevant codebase context, file paths, constraints}

## Artifact Type
{code | documentation | configuration | etc.}

## Instructions
Return only the final evaluation specification YAML in your response.
Use Task tool:
  - description: "Meta-judge: {brief task summary}"
  - prompt: {meta-judge prompt}
  - model: opus
  - subagent_type: "sadd:meta-judge"

2.2 Implementation Agent Prompt

Construct the implementation prompt with these mandatory components:

Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)

## Reasoning Approach

Before taking any action, think through this task systematically.

Let's approach this step by step:

1. "Let me understand what this task requires..."
   - What is the specific objective?
   - What constraints exist?
   - What is the expected outcome?

2. "Let me explore the relevant code..."
   - What files are involved?
   - What patterns exist in the codebase?
   - What dependencies need consideration?

3. "Let me plan my approach..."
   - What specific modifications are needed?
   - What order should I make them?
   - What could go wrong?

4. "Let me verify my approach before implementing..."
   - Does my plan achieve the objective?
   - Am I following existing patterns?
   - Is there a simpler way?

Work through each step explicitly before implementing.

Task Body

## Task
{Task description from user}

## Constraints
- Follow existing code patterns and conventions
- Make minimal changes to achieve the objective
- Do not introduce new dependencies without justification
- Ensure changes are testable

## Output
Provide your implementation along with a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
- Potential concerns or follow-up needed

Self-Critique Suffix (REQUIRED - MUST BE LAST)

## Self-Critique Verification (MANDATORY)

Before completing, verify your work. Do not submit unverified changes.

### Verification Questions

| # | Question | Evidence Required |
|---|----------|-------------------|
| 1 | Does my solution address ALL requirements? | [Specific evidence] |
| 2 | Did I follow existing code patterns? | [Pattern examples] |
| 3 | Are there any edge cases I missed? | [Edge case analysis] |
| 4 | Is my solution the simplest approach? | [Alternatives considered] |
| 5 | Would this pass code review? | [Quality check] |

### Answer Each Question with Evidence

Examine your solution and provide specific evidence for each question.

### Revise If Needed

If ANY verification question reveals a gap:
1. **FIX** - Address the specific gap identified
2. **RE-VERIFY** - Confirm the fix resolves the issue
3. **UPDATE** - Update the Summary section

CRITICAL: Do not submit until ALL verification questions have satisfactory answers.

Dispatch

Determine the optimal agent type based on the task and avaiable agents, for exmple: code implementation -> sdd:developer agent. If you not sure, better use general-purpose agent, than dispatch incorrect agent type.

Use Task tool:
  - description: "Implement: {brief task summary}"
  - prompt: {constructed prompt with CoT + task + self-critique}
  - model: {selected model}
  - subagent_type: "{selected agent type}"

2.3 Parallel Dispatch Example

Send BOTH Task tool calls in a single message. Meta-judge first, implementation second:

Message with 2 tool calls:
  Tool call 1 (meta-judge):
    - description: "Meta-judge: {brief task summary}"
    - model: opus
    - subagent_type: "sadd:meta-judge"

  Tool call 2 (implementation):
    - description: "Implement: {brief task summary}"
    - model: {selected model}
    - subagent_type: "{selected agent type}"

Wait for BOTH to return before proceeding to Phase 3.

Phase 3: Dispatch Judge Agent

After BOTH meta-judge and implementation complete, dispatch the judge agent.

CRITICAL: Provide to the judge EXACT meta-judge's evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or sumaraize any text in it!

Extract from meta-judge output:

  • The final evaluation specification YAML

Extract from implementation output:

  • Summary section (files modified, key changes)
  • Paths to files modified

3.1 Analyze the Pre-existing Changes Section

Before dispatching the judge, assess whether there are pre-existing changes in the codebase that the judge needs to be aware of. The "Pre-existing Changes" section prevents the judge from confusing prior modifications with the current implementation agent's work.

When to include:

  • Previous do-and-judge task runs completed earlier in the same session
  • User's manual modifications made before invoking the skill (visible from conversation context or in git)
  • Changes from other tools or agents that ran before this task

When to omit:

  • This is the first task with no known prior changes — omit the section entirely
  • On retries within the SAME task, do NOT include the implementation agent's own previous attempt as "pre-existing changes" — those are part of the current task's iteration cycle

Content guidelines:

  • Use a high-level summary: task description, list of affected files/modules, general nature of changes (created, modified, deleted)
  • Do NOT include code blocks, diffs, or line-level details — keep it concise
  • Label the source clearly: "Previous Task: {description}", "User modifications (before current task)", etc.
  • If multiple sources of pre-existing changes exist, use separate subsections for each

CRITICAL: avoid reading full codebase or git history, just use high-level git diff/status to determine which files were changed, or use conversation context to determine if there are any pre-existing changes.

3.2 Launch Judge with prompt and specification YAML

Judge prompt template:

You are evaluating an implementation artifact against an evaluation specification produced by the meta judge.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

## User Prompt
{Original task description from user}

{IF pre-existing changes are known, include the following section — otherwise omit entirely}

## Pre-existing Changes (Context Only)

The following changes were made BEFORE the current implementation agent started working. They are NOT part of the current task's output. Focus your evaluation on the current task's changes. Only verify pre-existing changed files/logic if they directly relate to the current task requirements.

### {Source of changes: e.g., "Previous Task: {task description}" or "User modifications (before current task)"}
{High-level summary: what was done, which
how to use sadd:do-and-judge

How to use sadd:do-and-judge on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add sadd:do-and-judge
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:do-and-judge

The skills CLI fetches sadd:do-and-judge from GitHub repository neolabhq/context-engineering-kit and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/sadd:do-and-judge

Reload or restart Cursor to activate sadd:do-and-judge. Access the skill through slash commands (e.g., /sadd:do-and-judge) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

User Story & Requirements Generation

Create detailed user stories, acceptance criteria, and feature specs

Example

Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios

Reduce spec writing time by 50%, ensure comprehensive coverage

Competitive Analysis

Research competitors, compare features, identify gaps

Example

Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities

Complete competitive research in 2 hours instead of 2 days

Roadmap Prioritization

Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs

Example

Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale

Make data-driven prioritization decisions faster

Stakeholder Communication

Draft PRDs, status updates, and stakeholder presentations

Example

Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement

Save 3-5 hours/week on communication overhead

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client
  • Access to product documentation and roadmap tools (Jira, Notion, etc.)
  • Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
  • Stakeholder contact information and communication channels

Time Estimate

30-60 minutes to see productivity improvements

Installation Steps

  1. 1.Install product management skill
  2. 2.Start with user story generation for known feature
  3. 3.Progress to competitive analysis: research 2-3 competitors
  4. 4.Use for roadmap prioritization: apply RICE/ICE scoring
  5. 5.Draft stakeholder communications and refine based on feedback
  6. 6.Build template library for recurring PM tasks
  7. 7.Share effective prompts with product team

Common Pitfalls

  • Not validating competitive research—verify facts before sharing
  • Accepting user stories without involving engineering team
  • Over-relying on frameworks without qualitative judgment
  • Not customizing outputs to company culture and communication style
  • Skipping stakeholder validation of generated requirements

Best Practices

✓ Do

  • +Validate research and competitive analysis with real data
  • +Collaborate with engineering when generating technical requirements
  • +Customize frameworks and templates to your company context
  • +Use skill for first drafts, refine with stakeholder input
  • +Document successful prompt patterns for PM tasks
  • +Combine AI efficiency with human judgment and intuition

✗ Don't

  • Don't publish competitive analysis without fact-checking
  • Don't finalize user stories without engineering review
  • Don't make prioritization decisions solely on AI scoring
  • Don't skip customer validation of generated requirements
  • Don't ignore company-specific context and culture

💡 Pro Tips

  • Provide context: company goals, constraints, customer feedback
  • Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
  • Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
  • Use skill for 70% generation + 30% customization to company needs

When to Use This

✓ Use When

Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.

✗ Avoid When

Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.

Learning Path

  1. 1Basic: user stories, feature specs, status updates
  2. 2Intermediate: competitive analysis, prioritization frameworks, PRDs
  3. 3Advanced: product strategy, go-to-market planning, OKR setting
  4. 4Expert: product vision, market positioning, business model innovation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.851 reviews
  • Dhruvi Jain· Dec 20, 2024

    Keeps context tight: sadd:do-and-judge is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Nikhil Liu· Dec 20, 2024

    Registry listing for sadd:do-and-judge matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Anika Desai· Dec 8, 2024

    We added sadd:do-and-judge from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Kofi Garcia· Dec 4, 2024

    sadd:do-and-judge reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Mia Jain· Nov 27, 2024

    sadd:do-and-judge reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Valentina Kapoor· Nov 23, 2024

    We added sadd:do-and-judge from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Oshnikdeep· Nov 11, 2024

    sadd:do-and-judge has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Anika Dixit· Nov 11, 2024

    sadd:do-and-judge fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Nikhil Johnson· Nov 7, 2024

    I recommend sadd:do-and-judge for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Carlos Mensah· Oct 26, 2024

    Useful defaults in sadd:do-and-judge — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

showing 1-10 of 51

1 / 6