Audits test suite for coverage gaps, test quality, flaky tests, and testing strategy alignment.

Kit Context

Before starting this skill, ensure you have completed session boot:

Read CODEBASE_MAP.md for project understanding
Read CLAUDE.project.md if it exists for project-specific rules
Read tasks/lessons.md for accumulated corrections

If any of these haven't been read in this session, read them now before proceeding.

When to Use

Invoke with /testing-audit when:

Test suite is unreliable or has frequent flaky tests
Coverage numbers look good but bugs still slip through
Planning a testing strategy improvement
Reviewing test quality after rapid feature development
Assessing confidence level before a major release

Scope Rules

Analyze ONLY the files and directories relevant to this skill's purpose
Do not refactor, fix, or modify code — this is a read-only analysis unless explicitly stated otherwise
Log unrelated issues found during analysis under tasks/todo.md > ## Not Now
State every assumption explicitly before acting on it
If the user specified a scope (files, directories, modules), respect it strictly

Context Gathering

Before analysis, map the project:

Read project config (package.json, pyproject.toml, go.mod, Cargo.toml, etc.)
Identify the tech stack, frameworks, and key dependencies
Map source directories — skip node_modules, vendor, build, .next, dist, __pycache__
Check for existing configurations relevant to this analysis (linters, formatters, CI configs)
If the user specified a scope, narrow to those files/directories only

Process

Phase 1: Test Inventory

Map the current test landscape:

Find test files — scan for test directories, *.test.*, *.spec.*, *_test.*, test_*.*
Identify test framework — Jest, Vitest, pytest, Go testing, JUnit, etc.
Categorize tests — unit, integration, e2e, snapshot, contract
Check test config — coverage thresholds, timeout settings, parallel execution

Phase 2: Coverage Analysis

Assess what is and isn't tested:

Structural Coverage

Which modules/directories have no tests at all?
Which critical paths (auth, payment, data mutation) lack tests?
Are edge cases covered? (empty input, boundary values, error paths)
Are error/failure paths tested, not just happy paths?

Meaningful Coverage

Do tests assert the right things? (behavior, not implementation)
Are there tests that always pass? (no real assertions, tautological checks)
Are there tests that test the framework instead of the application?
Do integration tests actually test integration? (or are they unit tests with extra setup)

Phase 3: Test Quality

Evaluate test code quality:

Readability

Are test names descriptive? (describe what, when, and expected outcome)
Is the Arrange-Act-Assert / Given-When-Then pattern followed?
Are test utilities and helpers well-organized?

Reliability

Flaky tests: tests dependent on timing, external services, or execution order
Test interdependence: tests that fail when run individually but pass in suite (or vice versa)
Non-deterministic: tests using random data, current time, or system state
Shared mutable state: global variables modified across tests without reset

Maintainability

Over-mocking: mocking so much that the test doesn't test real behavior
Under-mocking: integration tests that hit real external services without sandboxing
Brittle assertions: asserting exact strings, snapshots of entire objects, or implementation details
Test duplication: same scenario tested multiple times in different files
Setup overhead: tests requiring 50+ lines of setup for simple assertions

Phase 4: Testing Strategy

Assess the overall testing strategy:

Test pyramid balance: ratio of unit : integration : e2e tests
- Too many e2e tests = slow, brittle feedback loop
- Too few integration tests = false confidence from unit tests
Missing test types: no contract tests for APIs, no smoke tests for deployment
CI integration: are tests run on every PR? Is the feedback loop fast enough?
Test data management: how is test data created and cleaned up?

Output Format

# Testing Audit Report

## Test Inventory
| Category | Count | Framework | Status |
|----------|-------|-----------|--------|
| Unit     | N     | Jest      | ...    |
| Integration | N  | ...       | ...    |
| E2E      | N     | ...       | ...    |

## Coverage Gaps
### Untested Critical Paths
1. [module/feature] — [risk if untested]

### Weak Test Areas
1. [file:line] — [what's missing]

## Quality Issues
### Must Fix
- [issue + location + recommendation]

### Should Fix
- [issue + location + recommendation]

## Testing Strategy Assessment
- Pyramid balance: [top-heavy / balanced / bottom-heavy]
- Confidence level: [low / medium / high]
- Key risk: [single biggest testing risk]

## Recommendations
1. [Highest impact improvement]
2. ...

Report Guidelines

Use tables for structured findings — they're scannable and diffable
Include file paths with line numbers (file.ts:42) for every finding
Separate findings by severity: Critical > Major > Minor
End with actionable recommendations, not just observations
If no issues found in a category, state it explicitly — don't omit the section

Common Rationalizations

Rationalization	Reality
"100% coverage is overkill"	Nobody said 100%. But 0% on critical paths is negligent. Focus on risk, not percentages.
"Mocking is good enough"	Mocks test your assumptions, not reality. Integration tests catch what mocks hide.
"The code is too simple to test"	Simple code becomes complex code. Tests written now prevent regressions later.
"E2E tests cover this"	E2E tests are slow and flaky. Unit tests give fast, precise feedback. You need both.
"We'll add tests when we stabilize"	Code without tests never stabilizes. Tests are how you stabilize.

Notes

If no tests exist, the output should focus on a recommended testing strategy rather than auditing
Don't recommend 100% coverage — focus on critical path coverage
Consider the project's risk tolerance and deployment frequency when making recommendations

Edit this page on GitHub

Testing Audit

On this page