UX Audit
Dogfood web apps by browsing them as a real user would β with their goals, their patience, and their context. Goes beyond "does it work?" to "is it good?" by tracking emotional friction (trust, anxiety, confusion), counting click efficiency, testing resilience, and asking the ultimate question: "would I come back?" Uses Chrome MCP (for authenticated apps with your session) or Playwright for browser automation. Produces structured audit reports with findings ranked by impact.
Browser Tool Detection
Before starting any mode, detect available browser tools:
- Chrome MCP (
mcp__claude-in-chrome__*) β preferred for authenticated apps. Uses the user's logged-in Chrome session, so OAuth/cookies just work.
- Playwright MCP (
mcp__plugin_playwright_playwright__*) β for public apps or parallel sessions.
- playwright-cli β for scripted flows and sub-agent browser tasks.
If none are available, inform the user and suggest installing Chrome MCP or Playwright.
See references/browser-tools.md for tool-specific commands.
URL Resolution
If the user didn't provide a URL, find one automatically. Prefer the deployed/live version β that's what real users see.
-
Check wrangler.jsonc for custom domains or routes:
grep -E '"pattern"|"custom_domain"' wrangler.jsonc 2>/dev/null
If found, use the production URL (e.g. https://app.example.com).
-
Check for deployed URL in CLAUDE.md, README, or package.json homepage field.
-
Fall back to local dev server β check if one is already running:
lsof -i :5173 -i :3000 -i :8787 -t 2>/dev/null
If running, use http://localhost:{port}.
-
Ask the user as a last resort.
Why live over local: The live site has real data, real auth, real network latency, real CDN behaviour, and real CORS/CSP policies. Testing locally misses deployment-specific issues (missing env vars, broken asset paths, CORS errors, slow API responses). The UX audit should test what the user actually experiences.
When local is better: The user explicitly says "test localhost", or the feature isn't deployed yet.
Depth Levels
Control how thorough the audit is. Pass as an argument: /ux-audit quick, /ux-audit thorough, or default to standard.
| Depth |
Duration |
Autonomy |
What it covers |
| quick |
5-10 min |
Interactive |
One user flow, happy path only. Spot check after a change. |
| standard |
20-40 min |
Semi-autonomous |
Full walkthrough + QA sweep of main pages. Default. |
| thorough |
1-3 hours |
Fully autonomous |
Multiple personas, all pages, all modes combined. Overnight mode. |
| exhaustive |
4-8+ hours |
Fully autonomous |
Every interactive element on every page. Every button clicked, every dialog opened, every form filled, every state triggered. Leave nothing untested. |
Exhaustive Mode
The exhaustive mode goes beyond thorough. Thorough tests workflows and pages. Exhaustive tests every single interactive element in the application.
For each page discovered:
- Inventory all interactive elements β buttons, links, inputs, selects, checkboxes, toggles, tabs, accordions, modals triggers, dropdowns, context menus, drag handles, sliders
- Click/interact with every one β open every dialog, expand every accordion, select every tab, toggle every switch, trigger every dropdown
- Screenshot each state β default, hover, active, open, closed, expanded, collapsed, selected, error
- Test every form β fill with valid data, submit. Fill with invalid data, submit. Leave empty, submit. Test every field individually.
- Test every combination β if there are filters, test each filter value. If there are tabs, test each tab. If there are sort options, test each sort.
- Dark mode + light mode β every page, every dialog, every state in both modes
- Three viewport widths β 1280px, 768px, 375px for every page and dialog
- Keyboard navigation β tab through every page, verify focus order, test Enter/Space/Escape on every interactive element
- Right-click/context menus β if the app has custom context menus, test every option in every context
- Edge states β what happens with 0 items, 1 item, 100 items, 1000 items? What happens with very long text in every field?
- Concurrent tabs β open the same page in two tabs, interact in both, check for conflicts
- Every error path β trigger every validation error, every 404, every permission denied, every timeout
Progress tracking: This mode generates a LOT of findings. Write findings to the report incrementally β don't hold everything in memory. Update docs/ux-audit-exhaustive-YYYY-MM-DD.md after each page is complete.
Element inventory format (per page):
/clients β 47 interactive elements
[x] "Add Client" button β opens modal β, form submits β, validation β
[x] Search input β filters correctly β, clear button works β, empty search β
[x] Sort dropdown β all 4 options work β, persists on navigation β (BUG)
[x] Client row click β navigates to detail β
[x] Star button β toggles β, persists on refresh β
[ ] Pagination β next β, prev β, page numbers β, items per page β (not tested - no data)
...
Thorough Mode: Overnight Workflow
The thorough mode is designed to run unattended. Kick it off at end of day, review the report in the morning. The user should NOT need to find issues themselves β this mode catches everything.
Mindset: Don't run through a checklist. Think about the real person who will use this app every day. What are the threads of their workday? How will they move through the system? Will they understand what they're looking at? Will the app teach them how to use it through its design, or will they be guessing? Read references/workflow-comprehension.md before starting.
- Discover all routes β read router config, crawl navigation, build complete page inventory
- Identify workflow threads β what are the 3-5 real tasks a user does in a day? Map them before testing individual pages. See references/workflow-comprehension.md.
- Create a task list β track progress across the audit
- Visual & layout sweep (every page):
- Screenshot at 1280px, 1024px, 768px, 375px widths
- Screenshot in light mode and dark mode
- Run JS overflow detection on each page (see below)
- Check for clipped text, overlapping elements, broken grids
- Compare sidebar + content alignment across all pages
- Workflow thread testing β follow each identified thread end to end:
- Does the next step suggest itself at every point?
- Can the user leave and come back without losing their place?
- Do transitions between pages preserve context (filters, selections)?
- Do nav labels match how a user would describe their work?
- After creating/saving/deleting, does the app take them somewhere logical?
- UX Walkthrough x3 personas:
- First-time user (non-technical, time-poor, first visit)
- Power user (daily user, knows the app, looking for efficiency)
- Mobile user (phone, touch targets, small viewport)
- Full QA sweep β every page, all CRUD, all states (empty, error, loading, populated)
- Resilience testing β every form: bad data, mid-navigation, back button, refresh, double-submit
- Accessibility basics β heading hierarchy, alt text, focus order, colour contrast
- Console error sweep β check browser console on every page for JS errors, failed network requests, deprecation warnings
- Wayfinding & comprehension check β on each page: do I know where I am? Can I get back? Does the heading tell me what I can do here? Are visual cues guiding me to the right action?
- Scenario tests β run all six from references/scenario-tests.md:
- New hire onboarding (can you figure out the app with zero guidance?)
- Interrupted workflow (start a task, close the tab, come back β what survived?)
- Wrong turn recovery (go to the wrong page, how many clicks to get back on track?)
- Day two (repeat the same tasks β is it faster? are there shortcuts?)
- Explain it to a colleague (write a 2-min guide for each workflow β gaps = UX failures)
- What changed? (log in after creating data β can you tell what needs attention?)
- Screenshot everything β save to
.jez/screenshots/ux-audit/ (numbered chronologically)
- Comprehensive report β
docs/ux-audit-thorough-YYYY-MM-DD.md with issue counts by severity
- Summary β top 5 critical issues, workflow gaps, scenario test results, "one thing to fix first"
Automated Layout Detection (JS Injection)
On each page, inject JavaScript via the browser tool to programmatically detect layout issues:
document.querySelectorAll('*').forEach(el => {
const r = el.getBoundingClientRect();
const p = el.parentElement?.getBoundingClientRect();
if (p && (r.left < p.left - 1 || r.right > p.right + 1)) {
console.warn('OVERFLOW:', el.tagName, el.className, 'extends beyond parent');
}
});
document.querySelectorAll('h1,h2,h3,h4,p,span,a,button,label').forEach(el => {
if (el.scrollWidth > el.clientWidth + 2 || el.scrollHeight > el.clientHeight + 2) {
console.warn('CLIPPED:', el.tagName, el.textContent?.slice(0,50));
}
});
document.querySelectorAll('*').forEach(el => {
const s = getComputedStyle(el);
const r = el.getBoundingClientRect();
if (r.width > 0 && r.height > 0 && r.left + r.width < 0) {
console.warn('OFF-SCREEN LEFT:', el.tagName, el.className);
}
});
document.querySelectorAll('h1,h2,h3,p,span,a,li,td,th,label,button').forEach(el => {
const s = getComputedStyle(el);
if (s.color === s.backgroundColor || s.opacity === '0') {
console.warn('INVISIBLE TEXT:', el.tagName, el.textContent?.slice(0,30));