soak-test▌
Donchitos/Claude-Code-Game-Studios · updated Apr 16, 2026
### Soak Test
- ›description: "Generate a soak test protocol for extended play sessions. Defines what to observe, measure, and log during long play sessions to surface slow leaks, fatigue effects, and edge cases that
- ›argument-hint: "[duration: 30m | 1h | 2h | 4h] [focus: memory | stability | balance | all]"
- ›allowed-tools: Read, Glob, Grep, Write
Soak Test
A soak test (also called an endurance test) is an extended play session run with specific observation goals. Unlike a smoke check (broad critical path, ~10 min) or a single-feature playtest (~30 min), a soak test runs for 30 minutes to several hours to surface:
- Memory leaks — gradual heap growth that only appears after scene transitions
- Performance drift — frame time degradation that worsens over time
- State accumulation bugs — issues that only appear after N repetitions of a mechanic (inventory full, score overflow, AI state corruption)
- Fun fatigue — mechanics that feel good in a first session but grow repetitive over extended play
- Content exhaustion — the point where players run out of novel content
This skill generates the observation protocol and analysis harness — the human does the actual playing.
Output: production/qa/soak-test-[date]-[duration].md
When to run:
- Polish phase — before
/gate-check release - After fixing a memory or stability issue (regression soak)
- When extended play has not been formally tracked
1. Parse Arguments
Duration (default: 1h):
30m— short soak; suitable for testing a single mechanic or scene1h— standard soak; covers most common leak categories2h— extended soak; recommended for first full Polish soak4h— deep soak; required for games with long session design (RPGs, sims)
Focus (default: all):
memory— focus on heap size, object count, leak patternsstability— focus on crash/freeze/hang detectionbalance— focus on fun fatigue, content exhaustion, difficulty perceptionall— all of the above
2. Load Context
Read:
.claude/docs/technical-preferences.md— engine (for engine-specific memory monitoring guidance), performance budgets (memory ceiling, target FPS)design/gdd/game-concept.md— intended session length (for comparison against soak duration), core loop description- Most recent file in
production/playtests/— prior playtest findings (to avoid re-documenting known issues) - Most recent file in
production/qa/qa-plan-*.md— current sprint test coverage (to understand what has been formally tested vs. what the soak covers)
Note any performance budget targets from technical-preferences.md:
- Memory ceiling: [N MB, or "not set"]
- Target FPS: [N, or "not set"]
- Frame budget: [N ms, or "not set"]
3. Define Observation Checkpoints
Based on duration, generate timed checkpoints:
30m soak: T+0, T+10, T+20, T+30 1h soak: T+0, T+15, T+30, T+45, T+60 2h soak: T+0, T+20, T+40, T+60, T+80, T+100, T+120 4h soak: T+0, T+30, T+60, T+90, T+120, T+180, T+240
At each checkpoint, the observer records the observation items defined in Phase 4.
4. Generate the Soak Test Protocol
Memory / Stability observation items (if focus = memory or all)
Engine-specific monitoring guidance:
Godot 4:
- Open Debugger → Monitors tab; track
Memory → Static MemoryandObject Count → Objectsacross checkpoints - Record: Static Memory (KB), Object Count, Orphan Nodes count
- Alert threshold: Memory growth > 20% from T+0 after the first 15 minutes (some growth on load is expected; sustained growth indicates a leak)
- Note:
Performance.get_monitor(Performance.MEMORY_STATIC)returns bytes in Godot 4.6
Unity:
- Open Memory Profiler (Window → Analysis → Memory Profiler)
- Record: Total Reserved Memory (MB), GC Allocated (MB), Object Count at each checkpoint
- Alert threshold: GC Allocated growing monotonically across 3+ checkpoints
Unreal Engine:
- Use
stat memoryconsole command at each checkpoint - Record: Physical Memory Used (MB), Physical Memory Available
- Alert threshold: Physical Memory Used growth > 50MB over the full soak
Stability observation items (if focus = stability or all)
At each checkpoint, note:
- No crash, hang, or freeze occurred since last checkpoint
- Frame rate still within target budget ([target FPS] fps)
- Audio still playing correctly (no desync or silence)
- All HUD elements still rendering correctly
- Input responding as expected (no input loss or lag spike)
Balance / fatigue observation items (if focus = balance or all)
Collect subjective observations at each checkpoint:
- Core mechanic still feels rewarding (Y/N)
- Perceived difficulty level: [too easy / appropriate / too hard]
- Any "I've seen this before" moments since last checkpoint? (novel content exhaustion)
- Any moment of frustration since last checkpoint? Note cause.
- Any moment of peak engagement since last checkpoint? Note cause.
5. Generate the Protocol Document
# Soak Test Protocol
> **Date**: [date]
> **Duration**: [duration]
> **Focus**: [memory | stability | balance | all]
> **Engine**: [engine]
> **Generated by**: /soak-test
---
## Pre-Session Setup
Before starting the soak:
- [ ] Game is running from a **fresh launch** (not resumed from a prior session)
- [ ] All background applications closed (minimise OS memory interference)
- [ ] Performance monitoring tool open and recording:
- **Godot**: Debugger → Monitors tab → Memory section visible
- **Unity**: Memory Profiler window open
- **Unreal**: `stat memory` ready in console
- [ ] Soak target confirmed: [session design intent from game concept]
- [ ] Prior known issues to watch for: [from most recent playtest / qa-plan]
---
## Baseline (T+0) — Record Before Playing
| Metric | Baseline Value |
|--------|---------------|
| Memory / Heap | [record before first frame of gameplay] |
| Object Count | [record] |
| FPS (first 30 seconds) | [record] |
| [Engine-specific metric] | [record] |
---
## Checkpoint Log
### T+[N] minutes
**Memory / Stability** *(if applicable)*:
| Metric | Value | Δ from Baseline | Alert? |
|--------|-------|-----------------|--------|
| Memory / Heap | | | |
| Object Count | | | |
| FPS | | | |
| Crashes / Hangs | | | |
**Stability checks**:
- [ ] No crash or hang since last checkpoint
- [ ] Frame rate within budget ([N] fps target)
- [ ] Audio correct
- [ ] HUD rendering correctly
- [ ] Input responding correctly
**Balance / Fatigue** *(if applicable)*:
- Core mechanic still rewarding: Y / N
- Difficulty perception: too easy / appropriate / too hard
- Notable moments: [note any peak engagement or frustration]
- Content exhaustion signs: Y / N — [describe]
**Free observations**:
*(Note anything unexpected observed since the last checkpoint)*
---
[Repeat Checkpoint Log section for each timed checkpoint]
---
## Post-Session Analysis
### Memory Trend
| Checkpoint | Memory | Δ/hr extrapolated |
|------------|--------|-------------------|
| T+0 | | |
| [T+N] | | |
**Leak detected?** Y / N
**Estimated time to OOM at current rate**: [N hours / not applicable]
### Stability Summary
Total crashes: [N]
Total hangs: [N]
Worst FPS observed: [N] fps at [checkpoint]
Performance degradation: stable / mild / severe
### Balance / Fatigue Summary
Fun curve: [engaged throughout / fatigue onset at T+N / repetitive from start]
Content exhaustion point: [never / at T+N / early]
Difficulty arc: [appropriate / too easy throughout / difficulty spike at T+N]
### Issues Found
| ID | Severity | Checkpoint | Description |
|----|----------|------------|-------------|
| SOAK-001 | S[1-4] | T+[N] | [description] |
---
## Verdict: PASS / PASS WITH CONCERNS / FAIL
**PASS**: No leaks detected, stability maintained, fun factor consistent
**PASS WITH CONCERNS**: Minor drift or fatigue noted; addressable in Polish
**FAIL**: Memory leak confirmed, stability breach, or severe fun fatigue
---
## Sign-Off
- **Tester**: [name] — [date]
- **QA Lead review**: [name] — [date]
6. Write Output
Present the protocol summary in conversation, then ask:
"May I write this soak test protocol to
production/qa/soak-test-[date]-[duration].md?"
Write only after approval.
After writing:
"Protocol written. To run the soak:
- Open the file and follow the Pre-Session Setup checklist
- Record each checkpoint as you play
- Complete the Post-Session Analysis section when done
- File bugs from 'Issues Found' to
production/qa/bugs/ - Run
/bug-triage sprintafter the session to integrate any S1/S2 issues
If the verdict is FAIL, run /smoke-check again after fixing the issues."
Collaborative Protocol
- This skill generates a protocol — humans run it — never attempt to run a soak test automatically. The observations require a human observer.
- Duration should match the game's session design — a 5-minute game doesn't need a 4h soak; a city-builder might. Use judgment and ask if unclear.
- First soak should be
allfocus — narrow focus (memory-only) is for regression soaks after a specific fix, not the first pass - Ask before writing — always confirm before creating the protocol file
Ratings
4.5★★★★★33 reviews- ★★★★★Zaid Kapoor· Dec 8, 2024
soak-test fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Charlotte Garcia· Nov 23, 2024
soak-test is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Zaid Farah· Nov 19, 2024
soak-test reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Kiara Torres· Oct 14, 2024
Keeps context tight: soak-test is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Kiara Rahman· Oct 6, 2024
Registry listing for soak-test matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Henry Perez· Sep 25, 2024
Useful defaults in soak-test — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Yash Thakker· Sep 13, 2024
We added soak-test from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Sophia Malhotra· Sep 5, 2024
I recommend soak-test for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Sofia Bhatia· Aug 24, 2024
Useful defaults in soak-test — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Aisha Srinivasan· Aug 16, 2024
I recommend soak-test for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
showing 1-10 of 33