codex-review▌
benedictking/codex-review · updated Apr 8, 2026
Automated code review with intelligent difficulty assessment and CHANGELOG synchronization.
- ›Automatically detects uncommitted changes or reviews the latest commit; stages untracked files and updates CHANGELOG before invoking the review
- ›Evaluates task complexity (file count, line changes, refactoring scope) and selects appropriate model and reasoning effort level
- ›Separates preparation (CHANGELOG, staging) from review execution in isolated context to reduce token waste
- ›Supports Go,
Codex Code Review Skill
Trigger Conditions
Triggered when user input contains:
- "代码审核", "代码审查", "审查代码", "审核代码"
- "review", "code review", "review code", "codex 审核"
- "帮我审核", "检查代码", "审一下", "看看代码"
Core Concept: Intention vs Implementation
Running codex review --uncommitted alone only shows AI "what was done (Implementation)".
Recording intention first tells AI "what you wanted to do (Intention)".
"Code changes + intention description" as combined input is the most effective way to improve AI code review quality.
Skill Architecture
This skill operates in two phases:
- Preparation Phase (current context): Check working directory, update CHANGELOG
- Review Phase (isolated context): Invoke Task tool to execute Lint + codex review (using context: fork to reduce context waste)
Execution Steps
0. [First] Check Working Directory Status
git diff --name-only && git status --short
Decide review mode based on output:
- Has uncommitted changes → Continue with steps 1-4 (normal flow)
- Clean working directory → Directly invoke codex-runner:
codex review --commit HEAD
1. [Mandatory] Check if CHANGELOG is Updated
Before any review, must check if CHANGELOG.md contains description of current changes.
# Check if CHANGELOG.md is in uncommitted changes
git diff --name-only | grep -E "(CHANGELOG|changelog)"
If CHANGELOG is not updated, you must automatically perform the following (don't ask user to do it manually):
- Analyze changes: Run
git diff --statandgit diffto get complete changes - Auto-generate CHANGELOG entry: Generate compliant entry based on code changes
- Write to CHANGELOG.md: Use Edit tool to insert entry at top of
[Unreleased]section - Continue review flow: Immediately proceed to next steps after CHANGELOG update
Auto-generated CHANGELOG entry format:
## [Unreleased]
### Added / Changed / Fixed
- Feature description: what problem was solved or what functionality was implemented
- Affected files: main modified files/modules
Example - Auto-generation Flow:
1. Detected CHANGELOG not updated
2. Run git diff --stat, found handlers/responses.go modified (+88 lines)
3. Run git diff to analyze details: added CompactHandler function
4. Auto-generate entry:
### Added
- Added `/v1/responses/compact` endpoint for conversation context compression
- Supports multi-channel failover and request body size limits
5. Use Edit tool to write to CHANGELOG.md
6. Continue with lint and codex review
2. [Critical] Stage All New Files
Before invoking codex review, must add all new files (untracked files) to git staging area, otherwise codex will report P1 error.
# Check for new files
git status --short | grep "^??"
If there are new files, automatically execute:
# Safely stage all new files (handles empty list and special filenames)
git ls-files --others --exclude-standard -z | while IFS= read -r -d '' f; do git add -- "$f"; done
Explanation:
-zuses null character to separate filenames, correctly handles filenames with spaces/newlineswhile IFS= read -r -d ''reads filenames one by onegit add -- "$f"uses--separator, correctly handles filenames starting with-- When no new files exist, loop body doesn't execute, safely skipped
- This won't stage modified files, only handles new files
- codex needs files to be tracked by git for proper review
3. Evaluate Task Difficulty and Invoke codex-runner
Count change scale:
# Get summary line for ALL changes (staged + unstaged)
# IMPORTANT: Must use 'HEAD' as base to include both staged and unstaged changes
git diff --stat HEAD | tail -1
Why use git diff --stat HEAD:
git diff --statonly shows unstaged changesgit diff --cached --statonly shows staged changesgit diff --stat HEADshows BOTH staged and unstaged changes combined- The last line (
tail -1) is the summary line with total file count and line changes
Difficulty Assessment Criteria:
Model + Reasoning Effort Combinations:
| Combination | Quality | Time | Timeout | Recommended For |
|---|---|---|---|---|
model=gpt-5.2 model_reasoning_effort=xhigh |
Best | ~15-20 min | 40 min | Critical code, architecture changes |
model=gpt-5.3-codex model_reasoning_effort=xhigh |
High | ~8-9 min | 15 min | Difficult tasks (default) |
model=gpt-5.2 model_reasoning_effort=high |
High | ~8-9 min | 15 min | Alternative for difficult tasks |
model=gpt-5.3-codex model_reasoning_effort=high |
Good | ~5-6 min | 10 min | Normal tasks (default) |
Critical Tasks (meets any condition, use best quality model):
- Modified files ≥ 30
- Total code changes (insertions + deletions) ≥ 2000 lines
- Involves core architecture/algorithm changes (user explicitly mentioned)
- Config:
--config model=gpt-5.2 --config model_reasoning_effort=xhigh, timeout 40 minutes
Difficult Tasks (meets any condition):
- Modified files ≥ 10
- Total code changes (insertions + deletions) ≥ 500 lines
- Single metric: insertions ≥ 300 lines OR deletions ≥ 300 lines
- Cross-module refactoring
- Default config:
--config model=gpt-5.3-codex --config model_reasoning_effort=xhigh, timeout 15 minutes
Normal Tasks (other cases):
- Default config:
--config model=gpt-5.3-codex --config model_reasoning_effort=high, timeout 10 minutes
Evaluation Method:
You MUST parse the git diff --stat HEAD output correctly to determine difficulty:
# Get the summary line (last line of git diff --stat HEAD)
git diff --stat HEAD | tail -1
# Example outputs:
# "20 files changed, 342 insertions(+), 985 deletions(-)"
# "1 file changed, 50 insertions(+)" # No deletions
# "3 files changed, 120 deletions(-)" # No insertions
Critical: Why the summary line matters:
- Each file shows individual stats:
file.go | 171 ++++++++++++++++++++- - Only the LAST line has the total:
6 files changed, 1341 insertions(+), 18 deletions(-) - You must extract the last line with
tail -1to get accurate totals
Parsing Rules:
- Extract file count from "X file(s) changed" (handle both "1 file" and "N files")
- Extract insertions from "Y insertion(s)(+)" if present (handle both "1 insertion" and "N insertions"), otherwise 0
- Extract deletions from "Z deletion(s)(-)" if present (handle both "1 deletion" and "N deletions"), otherwise 0
- Calculate total changes = insertions + deletions
Important Edge Cases:
- Single file:
"1 file changed"(singular form) - No insertions: Git omits
"insertions(+)"entirely → treat as 0 - No deletions: Git omits
"deletions(-)"entirely → treat as 0 - Pure rename: May show
"0 insertions(+), 0 deletions(-)"or omit both
Decision Logic (check in order, first match wins):
- IF file_count >= 30 OR total_changes >= 2000 → Critical (gpt-5.2 + xhigh)
- IF file_count >= 10 → Difficult (gpt-5.3-codex + xhigh)
- IF total_changes >= 500 → Difficult (gpt-5.3-codex + xhigh)
- IF insertions >= 300 OR deletions >= 300 → Difficult (gpt-5.3-codex + xhigh)
- ELSE → Normal (gpt-5.3-codex + high)
Example Cases:
- ⭐ "50 files changed, 2000 insertions(+), 1500 deletions(-)" → 关键任务,使用
model=gpt-5.2 model_reasoning_effort=xhigh,超时 40 分钟(核心架构变更) - ✅ "20 files changed, 342 insertions(+), 985 deletions(-)" → 困难任务,使用
model=gpt-5.3-codex model_reasoning_effort=xhigh,超时 15 分钟 - ✅ "5 files changed, 600 insertions(+), 50 deletions(-)" → 困难任务,使用
model=gpt-5.3-codex model_reasoning_effort=xhigh,超时 15 分钟 - ❌ "3 files changed, 150 insertions(+), 80 deletions(-)" → 普通任务,使用
model=gpt-5.3-codex model_reasoning_effort=high,超时 10 分钟 - ❌ "1 file changed, 50 insertions(+)" → 普通任务,使用
model=gpt-5.3-codex model_reasoning_effort=high,超时 10 分钟
Invoke codex-runner Subtask:
Use Task tool to invoke codex-runner, passing complete command (including Lint + codex review):
Task parameters:
- subagent_type: Bash
- description: "Execute Lint and codex review"
- timeout: 900000 (15 minutes for difficult tasks) or 600000 (10 minutes for normal tasks)
- prompt: Choose corresponding command based on project type and difficulty
Go project - Difficult task:
go fmt ./... && go vet ./... && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=xhigh
(timeout: 900000)
Go project - Normal task:
go fmt ./... && go vet ./... && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
Node project - Difficult task:
npm run lint:fix && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=xhigh
(timeout: 900000)
Node project - Normal task:
npm run lint:fix && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
Python project - Difficult task:
black . && ruff check --fix . && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=xhigh
(timeout: 900000)
Python project - Normal task:
black . && ruff check --fix . && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
Clean working directory:
codex review --commit HEAD --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
4. Self-Correction
If Codex finds Changelog description inconsistent with code logic:
- Code error → Fix code
- Description inaccurate → Update Changelog
Complete Review Protocol
- [GATE] Check CHANGELOG - Auto-generate and write if not updated (leverage current context to understand change intention)
- [PREPARE] Stage Untracked Files - Add all new files to git staging area (avoid codex P1 error)
- [EXEC] Task → Lint + codex review - Invoke Task tool to execute Lint and codex (isolated context, reduce waste)
- [FIX] Self-Correction - Fix code or update description when intention ≠ implementation
Codex Review Command Reference
Basic Syntax
codex review [OPTIONS] [PROMPT]
Note: [PROMPT] parameter cannot be used with --uncommitted, --base, or --commit.
Common Options
| Option | Description | Example |
|---|---|---|
--uncommitted |
Review all uncommitted changes in working directory (staged + unstaged + untracked) | codex review --uncommitted |
--base <BRANCH> |
Review changes relative to specified base branch | codex review --base main |
--commit <SHA> |
Review changes introduced by specified commit | codex review --commit HEAD |
--title <TITLE> |
Optional commit title, displayed in review summary | codex review --uncommitted --title "feat: add JSON parser" |
-c, --config <key=value> |
Override configuration values | codex review --uncommitted -c model="o3" |
Usage Examples
# 1. Review all uncommitted changes (most common)
codex review --uncommitted
# 2. Review latest commit
codex review --commit HEAD
# 3. Review specific commit
codex review --commit abc1234
# 4. Review all changes in current branch relative to main
codex review --base main
# 5. Review changes in current branch relative to develop
codex review --base develop
# 6. Review with title (title shown in review summary)
codex review --uncommitted --title "fix: resolve JSON parsing errors"
# 7. Review using specific model
codex review --uncommitted -c model="o3"
Important Limitations
--uncommitted,--base,--commitare mutually exclusive, cannot be used together[PROMPT]parameter is mutually exclusive with the above three options- Must be executed in a git repository directory
Important Notes
- Ensure execution in git repository directory
- Timeout automatically adjusted based on task difficulty:
- Difficult tasks: 15 minutes (
timeout: 900000) - Normal tasks: 10 minutes (
timeout: 600000)
- Difficult tasks: 15 minutes (
- codex command must be properly configured and logged in
- codex automatically processes in batches for large changes
- CHANGELOG.md must be in uncommitted changes, otherwise Codex cannot see intention description
Design Rationale
Why separate contexts?
- CHANGELOG update needs current context: Understanding user's previous conversation and task intention to generate accurate change description
- Codex review doesn't need conversation history: Only needs code changes and CHANGELOG, more efficient to run independently
- Reduce token consumption: codex review as independent subtask, doesn't carry irrelevant conversation context
Ratings
4.6★★★★★51 reviews- ★★★★★Jin Yang· Dec 28, 2024
We added codex-review from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Charlotte Ramirez· Dec 28, 2024
Solid pick for teams standardizing on skills: codex-review is focused, and the summary matches what you get after install.
- ★★★★★Aanya Rao· Dec 16, 2024
codex-review fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Dhruvi Jain· Dec 8, 2024
Useful defaults in codex-review — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Pratham Ware· Dec 4, 2024
codex-review fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Hassan Sharma· Dec 4, 2024
I recommend codex-review for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Oshnikdeep· Nov 27, 2024
codex-review has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Min Farah· Nov 23, 2024
Solid pick for teams standardizing on skills: codex-review is focused, and the summary matches what you get after install.
- ★★★★★Alexander Johnson· Nov 23, 2024
We added codex-review from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Nikhil Rao· Nov 19, 2024
I recommend codex-review for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
showing 1-10 of 51