codex-autoresearch-loop▌
aradotso/trending-skills · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Skill by ara.so — Daily 2026 Skills collection.
Codex Autoresearch
Skill by ara.so — Daily 2026 Skills collection.
Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric.
Installation
Option A — manual copy into your project:
git clone https://github.com/leo-lilinxiao/codex-autoresearch.git
cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch
Option B — Codex skill installer:
$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch
The skill lives at .agents/skills/codex-autoresearch/ inside your project. No config file is required before first use.
How to Activate
Open Codex in your project directory and prefix your goal with $codex-autoresearch:
$codex-autoresearch
I want to get rid of all `any` types in my TypeScript code
Codex will:
- Scan the repo and infer scope, metric, verify command, and guard command.
- Present a confirmation summary — reply
go(or correct anything). - Run the loop unattended until you interrupt it or the goal is met.
You never write config. Codex infers everything.
Confirmation Flow
Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange:
Codex: I found 47 `any` occurrences across src/**/*.ts.
Confirmed:
- Target: eliminate `any` types in src/**/*.ts
- Metric: `any` count (current: 47), direction: lower
- Verify: grep + tsc --noEmit as guard
Need to confirm:
- Run until all gone, or cap at N iterations?
Reply "go" to start, or tell me what to change.
You: Go, run overnight.
Codex: Starting — baseline: 47. Iterating until interrupted.
Up to five confirmation rounds are possible. After that, Codex proceeds.
The Loop (internals)
PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume
PHASE 1: Read context + lessons file from prior run (if any)
LOOP (forever or N times):
1. Review current state, git history, results log, lessons
2. Pick ONE hypothesis (apply perspectives, filter by environment)
-- or N hypotheses if parallel mode is active
3. Make ONE atomic change
4. git commit (before verification)
5. Run verify command → did the target metric improve?
Run guard command → did anything else break?
6. Improved → keep (extract lesson)
Worse → approved rollback strategy (git revert)
Crashed → fix or skip
7. Log the result to results log
8. Health check (disk, git, verify health)
9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search
10. Repeat. Never stop. Never ask.
The loop runs unbounded unless you say Iterations: N during confirmation.
Dual-Gate Verification
Two commands serve distinct purposes:
| Gate | Purpose | Fails means |
|---|---|---|
| Verify | Did the target metric improve? | Change discarded, reverted |
| Guard | Did anything else break? | Change reworked (up to 2 attempts), then reverted |
Guard files are never modified by the loop.
Example verify + guard pair for a Python coverage run:
Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'
Guard: python -m mypy src --ignore-missing-imports
Example for TypeScript type cleanup:
Verify: grep -r "any" src --include="*.ts" | wc -l
Guard: npx tsc --noEmit
Modes
Codex maps your sentence to one of seven modes automatically — you never pick a mode explicitly.
loop — iterate toward a measurable target (default)
$codex-autoresearch
Improve test coverage in src/ to at least 80%
$codex-autoresearch
Reduce bundle size — it's currently 2.3 MB, get it under 1 MB
plan — turn a vague goal into a validated loop config
$codex-autoresearch
I want to make our API faster but I don't know where to start
Codex will interview you (p95 latency vs throughput? which endpoint?) and produce a ready-to-run loop config.
fix — repair errors until count reaches zero
$codex-autoresearch
pytest is failing, 12 tests broken after the refactor — fix them all
debug — evidence-driven root-cause hunting
$codex-autoresearch
Our API returns 503 randomly under load, no idea why
Each iteration tests one falsifiable hypothesis. Codex presents evidence, not guesses.
security — read-only STRIDE + OWASP audit
$codex-autoresearch
Is this code secure?
ship — readiness verification and release gating
$codex-autoresearch
Ship it
exec — one-shot execution with no loop
$codex-autoresearch
Run the benchmark suite and summarize results
Inline Configuration (optional)
You can override defaults inline during the confirmation step — no file edits needed:
| Phrase | Effect |
|---|---|
Iterations: 20 |
Cap the loop at 20 iterations |
Parallel: 3 |
Test 3 hypotheses concurrently per round |
Guard: npm test |
Override the inferred guard command |
Verify: <command> |
Override the inferred verify command |
Scope: src/api/ |
Restrict changes to a subdirectory |
Example during confirmation:
You: Go. Iterations: 30, Guard: npm test, Scope: src/api/
Cross-Run Learning
At the end of each iteration Codex writes a structured lesson to .agents/skills/codex-autoresearch/lessons.md:
Iteration 7 — KEPT
Hypothesis: replace explicit `any` with inferred generic in src/utils/mapper.ts
Change: added <T extends Record<string, unknown>> to mapKeys()
Result: any count 31 → 29
Lesson: Generic constraints on utility functions eliminate clusters of `any` downstream.
On session resume Codex reads this file first. Each new run benefits from prior runs.
To resume an interrupted run:
$codex-autoresearch
Resume
Codex re-reads the lessons file, checks git state, re-establishes the baseline, and continues.
Parallel Experiments
Request parallel mode during confirmation or at any time:
You: Go, parallel 4
Codex runs four hypotheses concurrently, keeps the best result, discards the rest. Useful when hypothesis space is large.
Pivot Protocol
If the loop stalls, escalation happens automatically:
| Consecutive discards | Action |
|---|---|
| 3 | REFINE — narrow hypothesis, try smaller atomic changes |
| 5 | PIVOT — change strategy entirely |
| 2 PIVOTs | Web search — Codex fetches external references to unstick itself |
You are never asked for permission during escalation. The loop continues.
Real Code Examples
Example 1 — TypeScript any elimination (Python verify script)
If you want a custom verify script instead of a one-liner:
# scripts/count_any.py
import subprocess, sys
result = subprocess.run(
["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],
capture_output=True, text=True
)
count = len(result.stdout.strip().splitlines())
print(count)
sys.exit(0) # always exit 0; the number is what matters
Tell Codex during confirmation:
Verify: python scripts/count_any.py
Guard: npx tsc --noEmit
Example 2 — pytest coverage loop (Python)
# scripts/coverage_pct.py
import subprocess, re, sys
out = subprocess.check_output(
["pytest", "--cov=src", "--cov-report=term", "-q"],
stderr=subprocess.STDOUT, text=True
)
match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)
if match:
print(int(match.group(1)))
sys.exit(0)
print(0)
sys.exit(0)
$codex-autoresearch
Improve test coverage — target 85%
Verify: python scripts/coverage_pct.py
Guard: python -m mypy src
Direction: higher
Target: 85
Iterations: 50
Example 3 — bundle size loop (Node.js project)
# scripts/bundle_size.sh
#!/usr/bin/env bash
npm run build --silent 2>/dev/null
du -k dist/bundle.js | awk '{print $1}'
$codex-autoresearch
Reduce our JS bundle size, currently ~2300 KB, target under 900 KB
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900
Example 4 — lint warning count (any language)
# scripts/lint_count.sh
#!/usr/bin/env bash
npx eslint src/ --format json 2>/dev/null \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
$codex-autoresearch
Get our ESLint warning count to zero
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0
Unattended Runs
For overnight or long runs, ensure Codex CLI approval settings do not interrupt git commit or git revert commands. The simplest option is to run in a disposable or sandboxed repo clone:
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandbox
# launch Codex here with full permissions
Results accumulate in git history. Pull the winning commits back to your main repo when done:
# in your main repo
git fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
Session Artifacts
| File | Contents |
|---|---|
.agents/skills/codex-autoresearch/lessons.md |
Structured lessons from every iteration |
.agents/skills/codex-autoresearch/results.log |
Full per-iteration log (metric value, kept/reverted, elapsed) |
.agents/skills/codex-autoresearch/session.json |
Current session state for resume |
These files persist across Codex sessions. Delete them to start fresh.
Troubleshooting
Loop reverts every change:
- Verify command may be returning a non-numeric value. Test it manually:
bash -c "<your verify command>"should print a single number. - Metric direction may be wrong. Confirm
Direction: lowerorDirection: higherduring setup.
Guard fires on unrelated files:
- Narrow scope:
Scope: src/specific-module/ - Or tell Codex explicitly:
Do not touch tests/during confirmation.
Session resume picks up wrong baseline:
- Delete
session.jsonto force a fresh baseline:rm .agents/skills/codex-autoresearch/session.json
Parallel mode produces merge conflicts:
- Codex handles this internally via the pivot protocol, but if it gets stuck, reduce parallelism:
Parallel: 2
Codex asks questions mid-loop:
- This means a guard crash produced ambiguous output. Pre-empt it by specifying
Guard: <command> || trueif guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely.
Loop hits PIVOT but makes no progress:
- Supply a seed hypothesis during confirmation:
Hint: try tree-shaking unused imports first - Or run
planmode first to produce a richer hypothesis list before switching toloop.
Quick Reference
# Start a loop
$codex-autoresearch
how to use codex-autoresearch-loopHow to use codex-autoresearch-loop on Cursor
AI-first code editor with Composer
1Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add codex-autoresearch-loop
2Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
$npx skills add https://github.com/aradotso/trending-skills --skill codex-autoresearch-loopThe skills CLI fetches codex-autoresearch-loop from GitHub repository aradotso/trending-skills and configures it for Cursor.
3Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
◆ Which agents do you want to install to?││ ── Universal (.agents/skills) ── always included ────│ • Amp│ • Antigravity│ • Cline│ • Codex│ ●Cursor(selected)│ • Cursor│ • Windsurf4Verify installation
Confirm successful installation by checking the skill directory location:
.cursor/skills/codex-autoresearch-loopReload or restart Cursor to activate codex-autoresearch-loop. Access the skill through slash commands (e.g., /codex-autoresearch-loop) or your agent's skill management interface.
⚠Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
Additional Resources
List & Monetize Your Skill
Submit your Claude Code skill and start earning
GET_STARTED →Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
✓Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
✓Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
✓Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
✓Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
general reviewsRatings
4.5★★★★★63 reviews- ★★★★★Benjamin Jain· Dec 28, 2024
Keeps context tight: codex-autoresearch-loop is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Evelyn Flores· Dec 28, 2024
Useful defaults in codex-autoresearch-loop — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Benjamin Perez· Dec 16, 2024
codex-autoresearch-loop reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Shikha Mishra· Dec 12, 2024
codex-autoresearch-loop has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Dev White· Dec 4, 2024
Registry listing for codex-autoresearch-loop matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Hiroshi Haddad· Nov 23, 2024
codex-autoresearch-loop reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Mia Robinson· Nov 19, 2024
codex-autoresearch-loop has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Benjamin Khanna· Nov 19, 2024
codex-autoresearch-loop is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Sakshi Patil· Nov 11, 2024
Solid pick for teams standardizing on skills: codex-autoresearch-loop is focused, and the summary matches what you get after install.
- ★★★★★Evelyn Torres· Nov 7, 2024
Registry listing for codex-autoresearch-loop matched our evaluation — installs cleanly and behaves as described in the markdown.
showing 1-10 of 63
1 / 7