sadd:judge-with-debate▌
neolabhq/context-engineering-kit · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Key benefits:
judge-with-debate
Key benefits:
- Structured evaluation - Meta-judge produces tailored rubrics and criteria before judging begins
- Multiple perspectives - Three independent judges reduce individual bias
- Evidence-based debate - Judges defend positions with specific evidence from the solution and evaluation specification
- Iterative refinement - Up to 3 debate rounds drive convergence on accurate scores
- Shared specification - Meta-judge runs once; all judges across all rounds share the same evaluation specification
Pattern: Debate-Based Evaluation
This command implements iterative multi-judge debate:
Phase 0: Setup
mkdir -p .specs/reports
|
Phase 0.5: Dispatch Meta-Judge
Meta-Judge (Opus)
|
Evaluation Specification YAML
|
Phase 1: Independent Analysis (3 judges in parallel)
+- Judge 1 -> {name}.1.md -+
Solution +- Judge 2 -> {name}.2.md -+-+
+- Judge 3 -> {name}.3.md -+ |
|
Phase 2: Debate Round (iterative) |
Each judge reads others' reports |
| |
Argue + Defend + Challenge |
(grounded in eval specification) |
| |
Revise if convinced --------------+
| |
Check consensus |
+- Yes -> Final Report |
+- No -> Next Round ---------+
Process
Setup: Create Reports Directory
Before starting evaluation, ensure the reports directory exists:
mkdir -p .specs/reports
Report naming convention: .specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md
Where:
{solution-name}- Derived from solution filename (e.g.,users-apifromsrc/api/users.ts){YYYY-MM-DD}- Current date[1|2|3]- Judge number
Phase 0.5: Dispatch Meta-Judge
Before independent analysis, dispatch a meta-judge agent to generate a tailored evaluation specification. The meta-judge runs ONCE and produces rubrics, checklists, and scoring criteria that ALL judges will use across ALL rounds.
Meta-judge prompt template:
## Task
Generate an evaluation specification yaml for the following evaluation task. You will produce rubrics, checklists, and scoring criteria that multiple judge agents will use to evaluate the solution through independent analysis and multi-round debate.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{task description - what the solution was supposed to accomplish}
## Context
{Any relevant context about the solution being evaluated}
## Artifact Type
{code | documentation | configuration | etc.}
## Evaluation Mode
Multi-judge debate with consensus-seeking across rounds
## Instructions
Return only the final evaluation specification YAML in your response.
The specification should support both independent analysis and debate-based refinement.
Dispatch:
Use Task tool:
- description: "Meta-judge: generate evaluation specification for {solution-name}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
Wait for the meta-judge to complete and extract the evaluation specification YAML from its output before proceeding to Phase 1.
Phase 1: Independent Analysis
Launch 3 independent judge agents in parallel (Opus for rigor):
- Each judge receives:
- Path to solution(s) being evaluated
- The meta-judge's evaluation specification YAML
- Task description
- Each produces independent assessment saved to
.specs/reports/{solution-name}-{date}.[1|2|3].md - Reports must include:
- Per-criterion scores with evidence
- Specific quotes/examples supporting ratings
- Overall weighted score
- Key strengths and weaknesses
Key principle: Independence in initial analysis prevents groupthink.
Prompt template for initial judges:
You are Judge {N} evaluating a solution independently against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Solution
{path to solution file(s)}
## Task Description
{what the solution was supposed to accomplish}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
Output File
.specs/reports/{solution-name}-{date}.{N}.md
Instructions
Follow your full judge process as defined in your agent instructions!
Additional instructions:
- Read the solution thoroughly
- For each criterion from the evaluation specification:
- Find specific evidence (quote exact text)
- Score on the defined scale
- Justify with concrete examples
- Calculate weighted overall score
- Write comprehensive report to {output_file}
Add to report beginning Done by Judge {N}
**Dispatch each judge:**
Use Task tool:
- description: "Judge {N}: independent analysis of {solution-name}"
- prompt: {judge prompt with evaluation specification YAML}
- model: opus
- subagent_type: "sadd:judge"
### Phase 2: Debate Rounds (Iterative)
For each debate round (max 3 rounds):
Launch **3 debate agents in parallel**:
1. Each judge agent receives:
- Path to their own previous report (`.specs/reports/{solution-name}-{date}.[1|2|3].md`)
- Paths to other judges' reports (`.specs/reports/{solution-name}-{date}.[1|2|3].md`)
- The original solution
- The meta-judge's evaluation specification YAML
2. Each judge:
- Identifies disagreements with other judges (>1 point score gap on any criterion)
- Defends their own ratings with evidence from the solution and evaluation specification
- Challenges other judges' ratings they disagree with
- Considers counter-arguments
- Revises their assessment if convinced
3. Updates their report file with new section: `## Debate Round {R}`
4. After they reply, if they reached agreement move to Phase 3: Consensus Report
**Key principle:** Judges communicate only through filesystem - orchestrator doesn't mediate and don't read reports files itself, it can overflow your context.
**Prompt template for debate judges:**
```markdown
You are Judge {N} in debate round {R}.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Your Previous Report
{path to .specs/reports/{solution-name}-{date}.{N}.md}
## Other Judges' Reports
Judge 1: .specs/reports/{solution-name}-{date}.1.md
...
## Task Description
{what the solution was supposed to accomplish}
## Solution
{path to solution}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
Output File
.specs/reports/{solution-name}-{date}.{N}.md (append to existing file)
Instructions
Follow your full judge process as defined in your agent instructions!
Additional debate instructions:
- Read your previous assessment from {your_previous_report}
- Read all other judges' reports
- Identify disagreements (where your scores differ by >1 point)
- For each major disagreement:
- State the disagreement clearly
- Defend your position with evidence from the solution and evaluation specification
- Challenge the other judge's position with counter-evidence
- Consider whether their evidence changes your view
- Update your report file by APPENDING debate round section
- Reply whether you reached agreement, and with which judge. Include revisited scores and criteria scores.
CRITICAL:
- Ground your arguments in the evaluation specification criteria
- Only revise if you find their evidence compelling
- Defend your original scores if you still believe them
- Quote specific evidence from the solution
**Dispatch each debate judge:**
Use Task tool:
- description: "Judge {N}: debate round {R} for {solution-name}"
- prompt: {debate judge prompt with evaluation specification YAML}
- model: opus
- subagent_type: "sadd:judge"
### Consensus Check
After each debate round, check for consensus:
**Consensus achieved if:**
- All judges' overall scores within 0.5 points of each other
- No criterion has >1 point disagreement across any two judges
- All judges explicitly state they accept the consensus
**If no consensus after 3 rounds:**
- Report persistent disagreements
- Provide all judge reports for human review
- Flag that automated evaluation couldn't reach consensus
**Orchestration Instructions:**
**Step 1: Dispatch Meta-Judge (Phase 0.5)**
1. Launch meta-judge agent
2. Wait for meta-judge to complete
3. Extract the evaluation specification YAML from meta-judge output
**Step 2: Run Independent Analysis (Phase 1)**
1. Launch 3 judge agents in parallel (Judge 1, 2, 3) with the evaluation specification YAML
2. Each writes their independent assessment to `.specs/reports/{solution-name}-{date}.[1|2|3].md`
3. Wait for all 3 agents to complete
**Step 3: Check for Consensus**
Let's work through this systematically to ensure accurate consensus detection.
Read all three reports and extract:
- Each judge's overall weighted score
- Each judge's score for every criterion
Check consensus step by step:
1. First, extract all overall scores from each report and list them explicitly
2. Calculate the difference between the highest and lowest overall scores
- If difference <= 0.5 points -> overall consensus achieved
- If difference > 0.5 points -> no consensus yet
3. Next, for each criterion, list all three judges' scores side by side
4. For each criterion, calculate the difference between highest and lowest scores
- If any criterion has difference > 1.0 point -> no consensus on that criterion
5. Finally, verify consensus is achieved only if BOTH conditions are met:
- Overall scores within 0.5 points
- All criterion scores within 1.0 point
**Step 4: Decision Point**
- **If consensus achieved**: Go to Step 6 (Generate Consensus Report)
- **If no consensus AND round < 3**: Go to Step 5 (Run Debate Round)
- **If no consensus AND round = 3**: Go to Step 7 (Report No Consensus)
**Step 5: Run Debate Round**
1. Increment round counter (round = round + 1)
2. Launch 3 judge agents in parallel with the same evaluation specification YAML
3. Each agent reads:
- Their own previous report from filesystem
- Other judges' reports from filesystem
- Original solution
4. Each agent appends "Debate Round {R}" section to their own report file
5. Wait for all 3 agents to complete
6. Go back to Step 3 (Check for Consensus)
**Step 6: Reply with Report**
Let's synthesize the evaluation results step by step.
1. Read all final reports carefully
2. Before generating the report, analyze the following:
- What is the consensus status (achieved or not)?
- What were the key points of agreement across all judges?
- What were the main areas of disagreement, if any?
- How did the debate rounds change the evaluations?
3. Reply to user with a report that contains:
- If there is consensus:
- Consensus scores (average of all judges)
- Consensus strengths/weaknesses
- Number of rounds to reach consensus
- Final recommendation with clear justification
- If there is no consensus:
- All judges' final scores showing disagreements
- Specific criteria where consensus wasn't reached
- Analysis of why consensus couldn't be reached
- Flag for human review
4. Command complete
**Step 7: Report No Consensus**
- Report persistent disagreements
- Provide all judge reports for human review
- Flag that automated evaluation couldn't reach consensus
### Phase 3: Consensus Report
If consensus achieved, synthesize the final report by working through each section methodically:
```markdown
# Consensus Evaluation Report
Let's compile the final consensus by analyzing each component systematically.
## Consensus Scores
First, let's consolidate all judges' final scores:
| Criterion | Judge 1 | Judge 2 | Judge 3 | Final |
|-----------|---------|---------|---------|-------|
| {Name} | {X}/5 | {X}/5 | {X}/5 | {X}/5 |
...
**Consensus Overall Score**: {avg}/5.0
## Consensus Strengths
[Review each judge's identified strengths and extract the common themes that all judges agreed upon]
## Consensus Weaknesses
[Review each judge's identified weaknesses and extract the common themes that all judges agreed upon]
## Debate Summary
Let's trace how consensus was reached:
- Rounds to consensus: {N}
- Initial disagreements: {list with specific criteria and score gaps}
- How resolved: {for each disagreement, explain what evidence or argument led to resolution}
## Final Recommendation
Based on the consensus scores and the key strengths/weaknesses identified:
{Pass/Fail/Needs Revision with clear justification tied to the evidence}
- Reports directory:
.specs/reports/(created if not exists) - Initial reports:
.specs/reports/{solution-name}-{date}.1.md,.specs/reports/{solution-name}-{date}.2.md,.specs/reports/{solution-name}-{date}.3.md - Debate updates: Appended sections in each report file per round
- Final synthesis: Replied to user (consensus or disagreement summary)
Best Practices
Meta-Judge + Judge Verification
- Never skip meta-judge - Tailored evaluation criteria produce better judgments and more grounded debates
- Meta-judge runs once - Same specification for all 3 judges across all debate rounds
- Include CLAUDE_PLUGIN_ROOT - Both meta-judge and judges need the resolved plugin root path
- Meta-judge YAML - Pass only the YAML to judges, do not modify it
- Debate grounding - Judges should reference evaluation specification criteria when de
How to use sadd:judge-with-debate on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add sadd:judge-with-debate
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches sadd:judge-with-debate from GitHub repository neolabhq/context-engineering-kit and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate sadd:judge-with-debate. Access the skill through slash commands (e.g., /sadd:judge-with-debate) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★53 reviews- ★★★★★Neel Rahman· Dec 28, 2024
sadd:judge-with-debate reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Chinedu Torres· Dec 20, 2024
sadd:judge-with-debate has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Chinedu Flores· Dec 20, 2024
Solid pick for teams standardizing on skills: sadd:judge-with-debate is focused, and the summary matches what you get after install.
- ★★★★★Ishan Singh· Dec 16, 2024
sadd:judge-with-debate is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Isabella Khan· Dec 16, 2024
Registry listing for sadd:judge-with-debate matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Dhruvi Jain· Dec 12, 2024
sadd:judge-with-debate fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Amina Jain· Dec 12, 2024
Useful defaults in sadd:judge-with-debate — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Nia Jackson· Nov 27, 2024
Solid pick for teams standardizing on skills: sadd:judge-with-debate is focused, and the summary matches what you get after install.
- ★★★★★Amina Desai· Nov 19, 2024
We added sadd:judge-with-debate from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Harper Martin· Nov 7, 2024
Useful defaults in sadd:judge-with-debate — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 53