evolve

boshu2/agentops · updated Apr 8, 2026

$npx skills add https://github.com/boshu2/agentops --skill evolve
0 commentsdiscussion
summary

Measure what's wrong. Fix the worst thing. Measure again. Compound.

skill.md

/evolve — Goal-Driven Compounding Loop

Measure what's wrong. Fix the worst thing. Measure again. Compound.

Always-on autonomous loop over /rpi. Work selection order: 0. Pinned work queue (--queue=<file> or inline roadmap — see references/pinned-queue.md)

  1. Harvested .agents/rpi/next-work.jsonl work (freshest concrete follow-up)
  2. Open ready beads work (bd ready)
  3. Failing goals and directive gaps (ao goals measure)
  4. Testing improvements (missing/thin coverage, missing regression tests)
  5. Validation tightening and bug-hunt passes (gates, audits, bug sweeps)
  6. Complexity / TODO / FIXME / drift / dead code / stale docs / stale research mining
  7. Concrete feature suggestions derived from repo purpose when no sharper work exists

Work generators that feed the selection ladder (auto-invoked, skip with --no-lifecycle):

  • Skill(skill="test", args="coverage") → files with <40% coverage become queue items (Step 3.4)
  • Skill(skill="refactor", args="--sweep all --dry-run") → functions with CC > 20 become queue items (Step 3.6)
  • Skill(skill="deps", args="audit") → deps with CVSS >= 7.0 or 2+ major versions behind become queue items (Step 3.5)
  • Skill(skill="perf", args="profile --quick") → perf findings become queue items when hot paths detected (Step 3.5)

Dormancy is last resort. Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes.

/evolve                      # Run until kill switch, max-cycles, or real dormancy
/evolve --max-cycles=5       # Cap at 5 cycles
/evolve --dry-run            # Show what would be worked on, don't execute
/evolve --beads-only         # Skip goals measurement, work beads backlog only
/evolve --quality            # Quality-first mode: prioritize post-mortem findings
/evolve --quality --max-cycles=10  # Quality mode with cycle cap
/evolve --compile            # Mine → Defrag warmup before first cycle
/evolve --compile --max-cycles=5 # Warm knowledge base then run 5 cycles
/evolve --test-first         # Default strict-quality /rpi execution path
/evolve --no-test-first      # Explicit opt-out from test-first mode
/evolve --queue=.agents/evolve/roadmap.md           # Process ordered roadmap
/evolve --queue=.agents/evolve/roadmap.md --test-first  # Roadmap with strict quality

Flags

Flag Default Description
--max-cycles=N unlimited Stop after N completed cycles
--dry-run off Show planned cycle actions without executing
--beads-only off Skip goal measurement and run backlog-only selection
--skip-baseline off Skip first-run baseline snapshot
--quality off Prioritize harvested post-mortem findings
--compile off Run ao mine + ao defrag warmup before cycle 1
--test-first on Pass strict-quality defaults through to /rpi
--no-test-first off Explicitly disable test-first passthrough to /rpi
--queue=<file> none Process items from ordered markdown queue file sequentially before fitness-driven selection
--no-lifecycle off Skip lifecycle work generators in Steps 3.4-3.6 (/test, /deps, /perf, /refactor). Falls back to manual scanning.

Execution Steps

YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.

FULLY AUTONOMOUS. Read references/autonomous-execution.md. Every /rpi uses --auto. Do NOT ask the user anything. Each cycle = complete 3-phase /rpi run.

Step 0: Setup

mkdir -p .agents/evolve
ao lookup --query "autonomous improvement cycle" --limit 5 2>/dev/null || true

Apply retrieved knowledge: If learnings are returned, check each for applicability to the current improvement cycle. For applicable learnings, cite by filename and record: ao metrics cite "<path>" --type applied 2>/dev/null || true

Before cycle recovery, load the repo execution profile contract when it exists. The repo execution profile is the source for repo policy; the user prompt should mostly supply mission/objective, not restate startup reads, validation bundle, tracker wrapper rules, or definition_of_done.

  • Locate docs/contracts/repo-execution-profile.md and docs/contracts/repo-execution-profile.schema.json.
  • Read the ordered startup_reads and bootstrap from those repo paths before selecting work.
  • Cache repo validation_commands, tracker_commands, and definition_of_done into session state.
  • If the repo execution profile is present but missing required fields, stop or downgrade with an explicit warning before cycle 1. Do not silently invent repo policy.

Then load the repo-local autodev program contract when it exists. The execution profile remains the repo bootstrap and landing-policy layer; PROGRAM.md or AUTODEV.md is the repo-local execution layer for the current improvement loop.

  • Locate PROGRAM.md and AUTODEV.md. PROGRAM.md takes precedence.
  • Read the resolved program before cycle recovery and cache program_path, mutable_scope, immutable_scope, validation_commands, decision_policy, and stop_conditions into session state.
  • If the program file exists but is structurally invalid, stop or downgrade with an explicit warning before cycle 1. Do not silently ignore a broken operator contract.
  • When a program contract exists, prefer work that can land wholly inside mutable scope. Do not silently widen scope around immutable files.

Recover cycle number, queue/generator streaks, and the last claimed work item from disk (survives context compaction). Initialize CYCLE from cycle-history.jsonl, recover IDLE_STREAK, GENERATOR_EMPTY_STREAK, LAST_SELECTED_SOURCE, and CLAIMED_WORK_REF from session-state.json.

Circuit breakers: Time-based (60 min no productive work) and consecutive failure (5 in queue mode). See references/roadmap-queue-patterns.md for queue-specific circuit breakers.

Oscillation quarantine: Pre-populate quarantine list from cycle history (scan for goals with 3+ improved-to-fail transitions). See references/oscillation.md.

Parse flags: --max-cycles=N (default unlimited), --dry-run, --beads-only, --skip-baseline, --quality, --compile, --queue=<file>.

Step 0.1: Parse Pinned Queue (--queue only)

Skip if --queue was not passed. Read references/roadmap-queue-patterns.md for the full queue parsing, state persistence, and resume protocol. See also references/pinned-queue.md for format specification and blocker syntax.

Track cycle-level execution state:

evolve_state = {
  cycle: <current cycle number>,
  mode: <standard|quality|beads-only>,
  test_first: <true by default; false only when --no-test-first>,
  repo_profile_path: <docs/contracts/repo-execution-profile.md or null>,
  startup_reads: <ordered repo bootstrap paths>,
  validation_commands: <ordered repo validation bundle>,
  tracker_commands: <repo tracker shell wrappers>,
  definition_of_done: <repo stop predicates>,
  program_path: <PROGRAM.md|AUTODEV.md or null>,
  program_mutable_scope: <declared mutable paths/globs>,
  program_immutable_scope: <declared immutable paths/globs>,
  program_validation_commands: <ordered program validation bundle>,
  program_decision_policy: <ordered keep/revert rules>,
  program_stop_conditions: <ordered cycle done criteria>,
  generator_empty_streak: <consecutive passes where all generator layers returned nothing>,
  last_selected_source: <harvested|beads|goal|directive|testing|validation|bug-hunt|drift|feature>,
  claimed_work: <null or queue reference being worked>,
  queue_refresh_count: <incremented after every /rpi cycle>,
  pinned_queue: <parsed items array or null>,
  pinned_queue_file: <path or null>,
  pinned_queue_index: <current 0-based position>,
  pinned_queue_completed: <array of completed item IDs>,
  pinned_queue_escalated: <array of escalated items with reasons>,
  unblock_depth: <current nesting depth, 0 when not unblocking>,
  unblock_failures: <consecutive failures on current item>,
  unblock_chain: <stack of blocker IDs being resolved>
}

Persist evolve_state to .agents/evolve/session-state.json at each cycle boundary, after queue claims, after queue release/finalize, and during teardown. cycle-history.jsonl remains the canonical cycle ledger; session-state.json carries resume-only state that has not yet earned a committed cycle entry.

Step 0.2: Compile Warmup (--compile only)

Skip if --compile was not passed or if --dry-run. Read references/knowledge-loop-integration.md for the full warmup procedure (mine + defrag + signal notes).

Step 0.5: Baseline (first run only)

Skip if --skip-baseline or --beads-only or baseline already exists. Read references/fitness-scoring.md for the baseline capture procedure.

Step 1: Kill Switch Check

Run at the TOP of every cycle:

CYCLE_START_SHA=$(git rev-parse HEAD)
[ -f ~/.config/evolve/KILL ] && echo "KILL: $(cat ~/.config/evolve/KILL)" && exit 0
[ -f .agents/evolve/STOP ] && echo "STOP: $(cat .agents/evolve/STOP 2>/dev/null)" && exit 0

Step 2: Measure Fitness

Skip if --beads-only. Run scripts/evolve-measure-fitness.sh to produce a rolling fitness snapshot at .agents/evolve/fitness-latest.json. Read references/fitness-scoring.md for the full measurement procedure, baseline capture, and post-cycle regression detection.

Step 3: Select Work

Selection is a ladder, not a one-shot check. After every productive cycle, return to the TOP of this step and re-read the queue before considering dormancy.

When a repo-local program contract exists, apply a scope filter before Step 4:

  • candidate work that clearly requires immutable-scope edits is not eligible for direct execution
  • prefer harvested, beads, goals, and generated work that can plausibly land within mutable scope
  • if the selected item is inherently out of scope, escalate it or convert it into durable follow-up work instead of invoking /rpi and hoping discovery widens scope

Step 3.0: Pinned work queue (only when --queue is set)

Read references/roadmap-queue-patterns.md for the full pinned queue work selection protocol (item-to-prompt mapping, escalation cascade, blocker detection). When pinned queue is active, skip Steps 3.1-3.7 entirely. When exhausted, fall through to normal selection.

Step 3.1: Harvested work first

Read .agents/rpi/next-work.jsonl and pick the highest-value unconsumed item. Prefer exact repo match, then concrete implementation work, then higher severity. Read references/knowledge-loop-integration.md for the claim/release protocol.

Step 3.2: Open ready beads

If no harvested item is ready, check bd ready. Pick the highest-priority unblocked issue.

Step 3.3: Failing goals and directive gaps (skip if --beads-only)

First assess directives, then goals:

  • top-priority directive gap from ao goals measure --directives
  • highest-weight failing goals (skip quarantined oscillators)
  • lower-weight failing goals

This step exists even when all queued work is empty. Goals are the third source, not the stop condition.

DIRECTIVES=$(ao goals measure --directives 2>/dev/null)
FAILING=$(jq -r '.goals[] | select(.result=="fail") | .id' .agents/evolve/fitness-latest.json | head -1)

Oscillation check: Before working a failing goal, check if it has oscillated (improved-to-fail transitions >= 3 times). If so, quarantine it and try the next goal. See references/oscillation.md and references/fitness-scoring.md for the detection procedure.

Step 3.4: Testing improvements

When queues and goals are empty, generate concrete testing work via /test:

if --no-lifecycle is NOT set:
  Skill(skill="test", args="coverage")
  Only files with < 40% coverage become queue items (severity threshold).

If /test is unavailable or --no-lifecycle is set, fall back to manual scanning:

  • find packages/files with thin or missing tests
  • look for missing regression tests around recent bug-fix paths
  • identify flaky or absent headless/runtime smokes

Convert any real finding into durable work:

  • add a bead when the work needs tracked backlog ownership, or
  • append a queue item under the shared next-work contract when it should flow directly back into /rpi

Step 3.5: Validation tightening and bug-hunt passes

If testing improvement generation returns nothing, run lifecycle generators then bug-hunt sweeps:

if --no-lifecycle is NOT set:
  a) Skill(skill="deps", args="audit")
     Only deps with CVSS >= 7.0 or 2+ major versions behind become queue items.

  b) if perf-sensitive code detected (benchmarks exist, hot path patterns):
       Skill(skill="perf", args="profile --quick")
       Convert significant perf findings to queue items.

If lifecycle generators return nothing or are skipped, fall back to manual sweeps:

  • missing validation gates
  • weak lint/contract coverage
  • bug-hunt style audits for risky areas
  • stale assumptions between docs, contracts, and runtime truth

Again: convert findings into beads or queue items, then immediately select the highest-priority result and continue.

Step 3.6: Drift / hotspot / dead-code mining

If the prior generators are empty, mine for complexity debt via /refactor:

if --no-lifecycle is NOT set:
  Skill(skill="refactor", args="--sweep all --dry-run")
  Only functions with CC > 20 become queue items (severity threshold).

If /refactor is unavailable or --no-lifecycle is set, fall back to manual mining:

  • complexity hotspots
  • stale TODO/FIXME markers
  • dead code
  • stale docs
  • stale research
  • drift between generated artifacts and source-of-truth files

Do not stop here. Normalize findings into tracked work and continue.

Step 3.7: Feature suggestions

If all concrete remediation layers are empty, propose one or more specific feature ideas grounded in the repo purpose, write them as durable work, and continue:

  • create a bead when the feature needs review/backlog treatment
  • or append a queue item with source: "feature-suggestion" when it is ready for the next /rpi cycle

Quality mode (--quality) — inverted cascade (findings before directives):

Step 3.0q: Unconsumed high-severity post-mortem findings:

HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high") | .title' \
  .agents/rpi/next-work.jsonl 2>/dev/null | head -1)

Step 3.1q: Unconsumed medium-severity findings.

Step 3.2q: Open ready beads.

Step 3.3q: Emergency gates (weight >= 5) and top directive gaps.

Step 3.4q: Testi

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.760 reviews
  • Arya Smith· Dec 28, 2024

    evolve has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Nia Menon· Dec 20, 2024

    Keeps context tight: evolve is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Dhruvi Jain· Dec 8, 2024

    Registry listing for evolve matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Aisha Sharma· Dec 8, 2024

    evolve reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Nia Martinez· Dec 4, 2024

    Useful defaults in evolve — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Oshnikdeep· Nov 27, 2024

    Keeps context tight: evolve is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Nia Jain· Nov 23, 2024

    I recommend evolve for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Sophia Desai· Nov 23, 2024

    evolve reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Fatima Flores· Nov 19, 2024

    Solid pick for teams standardizing on skills: evolve is focused, and the summary matches what you get after install.

  • Dev Verma· Nov 19, 2024

    We added evolve from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

showing 1-10 of 60

1 / 6