A coding agent loop designed to diagnose and resolve production incidents through iterative investigation, targeted fixes, and continuous health monitoring until system stability is restored.
Resolve production incident
Provide a production incident description to this loop. The agent will iteratively investigate, apply fixes, and validate system health until the exit condition is met or max iterations are reached.
Initiate Incident Response
Run the kickoff prompt with a detailed description of the production issue including any error messages, affected components, and initial observations.
Monitor Progress
The agent will automatically execute check commands after each action. Review results and approve/deny proposed changes to ensure safe resolution.
Verify Resolution
Once monitoring shows healthy status, confirm the fix addresses root cause and doesn't introduce new issues.
Understand incident context and scope from user-provided description
Validate understanding with clarifying questions
Triage by identifying most likely root causes and failure points
Prioritize potential issues based on impact and evidence
Debugging
This loop guides you through reproducing a reported bug, identifying its root cause, implementing a fix, and verifying the solution through automated testing. The agent will iteratively work to resolve the issue while maintaining system integrity.
Debugging
This loop analyzes application error logs to identify and fix recurring errors, reducing their frequency over time through iterative debugging and targeted code improvements.
Debugging
A systematic loop for identifying the root cause of code issues, bugs, or unexpected behavior through iterative investigation and analysis, ensuring developers address foundational problems rather than surface-level symptoms.
Investigate logs, metrics, and recent changes to confirm diagnosis
Analyze monitoring data and correlate with symptoms
Implement targeted fix for identified root cause
Apply change and run health check command
Validate fix doesn't break other functionality
Run comprehensive tests and monitor for regressions
Document resolution and update incident records
Confirm knowledge capture and prepare handoff notes
Start the "Production Incident Resolver" loop. Goal: Resolve production incident Max iterations: 10 Between iterations run: health check Exit when: Monitoring healthy I'm experiencing a production incident. Here's what I know so far: [DESCRIBE INCIDENT]. Please guide me through resolving this systematically while keeping our services stable. Self-pace this loop. After each iteration, run `health check` and evaluate the output, and only continue if the exit condition is not met (Monitoring healthy). Stop when the exit condition passes or 10 iterations are reached. Give a short status update each pass.