Confirm successful installation by checking the skill directory location:
.cursor/skills/codex-autoresearch-loop
Restart Cursor to activate codex-autoresearch-loop. Access via /codex-autoresearch-loop in your agent's command palette.
β
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Codex Autoresearch is a Codex skill that runs an autonomous modifyβverifyβkeep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended β every improvement stacks in git, every failure reverts automatically β until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric.
The skill lives at .agents/skills/codex-autoresearch/ inside your project. No config file is required before first use.
How to Activate
Open Codex in your project directory and prefix your goal with $codex-autoresearch:
$codex-autoresearch
I want to get rid of all `any` types in my TypeScript code
Codex will:
Scan the repo and infer scope, metric, verify command, and guard command.
Present a confirmation summary β reply go (or correct anything).
Run the loop unattended until you interrupt it or the goal is met.
You never write config. Codex infers everything.
Confirmation Flow
Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange:
Codex: I found 47 `any` occurrences across src/**/*.ts.
Confirmed:
- Target: eliminate `any` types in src/**/*.ts
- Metric: `any` count (current: 47), direction: lower
- Verify: grep + tsc --noEmit as guard
Need to confirm:
- Run until all gone, or cap at N iterations?
Reply "go" to start, or tell me what to change.
You: Go, run overnight.
Codex: Starting β baseline: 47. Iterating until interrupted.
Up to five confirmation rounds are possible. After that, Codex proceeds.
The Loop (internals)
PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume
PHASE 1: Read context + lessons file from prior run (if any)
LOOP (forever or N times):
1. Review current state, git history, results log, lessons
2. Pick ONE hypothesis (apply perspectives, filter by environment)
-- or N hypotheses if parallel mode is active
3. Make ONE atomic change
4. git commit (before verification)
5. Run verify command β did the target metric improve?
Run guard command β did anything else break?
6. Improved β keep (extract lesson)
Worse β approved rollback strategy (git revert)
Crashed β fix or skip
7. Log the result to results log
8. Health check (disk, git, verify health)
9. If 3+ discards β REFINE; 5+ β PIVOT; 2 PIVOTs β web search
10. Repeat. Never stop. Never ask.
The loop runs unbounded unless you say Iterations: N during confirmation.
Dual-Gate Verification
Two commands serve distinct purposes:
Gate
Purpose
Fails means
Verify
Did the target metric improve?
Change discarded, reverted
Guard
Did anything else break?
Change reworked (up to 2 attempts), then reverted
Guard files are never modified by the loop.
Example verify + guard pair for a Python coverage run:
Web search β Codex fetches external references to unstick itself
You are never asked for permission during escalation. The loop continues.
Real Code Examples
Example 1 β TypeScript any elimination (Python verify script)
If you want a custom verify script instead of a one-liner:
# scripts/count_any.pyimport subprocess, sys
result = subprocess.run(["grep","-r","--include=*.ts",r"\bany\b","src/"], capture_output=True, text=True)count =len(result.stdout.strip().splitlines())print(count)sys.exit(0)# always exit 0; the number is what matters
# scripts/bundle_size.sh#!/usr/bin/env bashnpm run build --silent2>/dev/null
du-k dist/bundle.js |awk'{print $1}'
$codex-autoresearch
Reduce our JS bundle size, currently ~2300 KB, target under 900 KB
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900
Example 4 β lint warning count (any language)
# scripts/lint_count.sh#!/usr/bin/env bashnpx eslint src/ --format json 2>/dev/null \| python3 -c"import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
$codex-autoresearch
Get our ESLint warning count to zero
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0
Unattended Runs
For overnight or long runs, ensure Codex CLI approval settings do not interrupt git commit or git revert commands. The simplest option is to run in a disposable or sandboxed repo clone:
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandbox
# launch Codex here with full permissions
Results accumulate in git history. Pull the winning commits back to your main repo when done:
# in your main repogit fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
Session Artifacts
File
Contents
.agents/skills/codex-autoresearch/lessons.md
Structured lessons from every iteration
.agents/skills/codex-autoresearch/results.log
Full per-iteration log (metric value, kept/reverted, elapsed)
.agents/skills/codex-autoresearch/session.json
Current session state for resume
These files persist across Codex sessions. Delete them to start fresh.
Troubleshooting
Loop reverts every change:
Verify command may be returning a non-numeric value. Test it manually: bash -c "<your verify command>" should print a single number.
Metric direction may be wrong. Confirm Direction: lower or Direction: higher during setup.
Guard fires on unrelated files:
Narrow scope: Scope: src/specific-module/
Or tell Codex explicitly: Do not touch tests/ during confirmation.
Session resume picks up wrong baseline:
Delete session.json to force a fresh baseline: rm .agents/skills/codex-autoresearch/session.json
Parallel mode produces merge conflicts:
Codex handles this internally via the pivot protocol, but if it gets stuck, reduce parallelism: Parallel: 2
Codex asks questions mid-loop:
This means a guard crash produced ambiguous output. Pre-empt it by specifying Guard: <command> || true if guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely.
Loop hits PIVOT but makes no progress:
Supply a seed hypothesis during confirmation: Hint: try tree-shaking unused imports first
Or run plan mode first to produce a richer hypothesis list before switching to loop.
Quick Reference
# Start a loop
$codex-autoresearch
β
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
βΊAccess to product documentation and roadmap tools (Jira, Notion, etc.)
βΊUnderstanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
βΊStakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Steps
1Install product management skill
2Start with user story generation for known feature
3Progress to competitive analysis: research 2-3 competitors
4Use for roadmap prioritization: apply RICE/ICE scoring
5Draft stakeholder communications and refine based on feedback
6Build template library for recurring PM tasks
7Share effective prompts with product team
Common Pitfalls
β Not validating competitive researchβverify facts before sharing
β Accepting user stories without involving engineering team
β Over-relying on frameworks without qualitative judgment
β Not customizing outputs to company culture and communication style
β Skipping stakeholder validation of generated requirements
Best Practices
β Do
+Validate research and competitive analysis with real data
+Collaborate with engineering when generating technical requirements
+Customize frameworks and templates to your company context
+Use skill for first drafts, refine with stakeholder input
+Document successful prompt patterns for PM tasks
+Combine AI efficiency with human judgment and intuition
β Don't
βDon't publish competitive analysis without fact-checking
βDon't finalize user stories without engineering review
βDon't make prioritization decisions solely on AI scoring
βDon't skip customer validation of generated requirements
βDon't ignore company-specific context and culture
π‘ Pro Tips
β Provide context: company goals, constraints, customer feedback
β Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
β Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
β Use skill for 70% generation + 30% customization to company needs
When to Use This
β Use when
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
β Avoid when
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path
1Basic: user stories, feature specs, status updates