Review Pipeline
Run multiple audits in parallel over a PR-scope diff, dedupe across auditors, and produce a confidence-gated report. Use to review a change or PR before merge.
Core Rule
Parallelize independent auditors over a PR-scope diff. Dedupe across reports. Never widen scope beyond the diff.
Kit Context
Before starting this skill, ensure you have completed session boot:
- Read
CODEBASE_MAP.mdfor project understanding - Read
CLAUDE.project.mdif it exists for project-specific rules - Read
tasks/lessons/_index.mdfor accumulated corrections (Top Rules + index)
If any of these haven't been read in this session, read them now before proceeding.
When to Use
Invoke with /review-pipeline when:
- A PR is ready and you want a multi-lens check before requesting human review
- A multi-session feature has landed on a branch and you want to catch regressions
- You're investigating a flaky area and want several auditors to weigh in
- You suspect issues but don't know which audit applies — let the pipeline pick
Not for:
- Trivial single-file edits — invoke the relevant audit directly
- Project-wide periodic health checks — use
/project-health-reportinstead - Full architectural assessments — use
/architecture-reviewor/deepening-review
Scope Rules
- Read-only — produces a report, never modifies code
- Operates within the detected scope (recent changes by default)
- Logs unrelated issues found during analysis under
tasks/todo.md > ## Not Now - Does not duplicate findings already in CLAUDE.md or
tasks/lessons/
Process
Phase 1: Resolve Scope
Pick the smallest scope that captures the change:
- If the user passed paths or globs — use those literally
- Else if the working tree has uncommitted changes — use
git diff --name-only HEAD - Else if on a feature branch — use
git diff --name-only $(git merge-base HEAD origin/main)...HEAD(fall back tomainiforigin/mainis absent) - Else — ask the user which path or commit range to review; do not silently default to the whole repo
Filter the file list:
- Drop generated paths (
node_modules/,dist/,build/,.next/,__pycache__/,vendor/) - Drop lockfiles unless
dependency-auditis selected - If the resulting list is empty, stop and report "no in-scope changes detected"
Save the resolved scope as a bulleted list — every parallel auditor gets the same list.
Phase 2: Select Audits
Always select these core audits:
code-quality-audit— smells, error handling, maintainabilitytesting-audit— coverage gaps, test quality
Conditionally add based on file patterns in the resolved scope:
| Pattern in scope | Add audit |
|---|---|
*.tsx, *.jsx, *.vue, *.svelte, *.html | accessibility-audit |
| Service / handler / query / loop heavy code (≥1 file) | performance-audit |
package.json, pyproject.toml, requirements*.txt, Cargo.toml, go.mod | dependency-audit |
| Module-level or full-folder change (≥5 files in one dir) | dead-code-audit |
README*, docs/, public API surface changes | documentation-audit |
| Auth, session, token, crypto, request validation paths | security-reviewer (agent, not skill) |
| Multi-module structural change | architecture-review |
Cap at 5 parallel audits by default to keep token spend bounded. If more are eligible, pick the 5 with the strongest fit to the scope and list the skipped ones in the report's "Skipped audits" section with a one-line rationale.
If the user passed an explicit list (e.g., /review-pipeline testing,security), honor it literally and skip selection logic.
Phase 3: Run Audits in Parallel
For each selected audit, dispatch one Task in a single message containing all tool calls so they run concurrently. Pattern per Task:
Task(
description: "<audit-name> on <short scope label>",
subagent_type: "general-purpose",
prompt: """
You are running the <audit-name> audit on a scoped set of files.
Step 1 — Read the skill file at `.claude/skills/<audit-name>/SKILL.md` and follow
its Process section. Do not invoke other audits; do not analyze files outside
the scope below.
Step 1b — Comparative pass: sample how the codebase already handles this concern
(neighbouring files, shared helpers, established conventions). Flag *deviations*
from the existing pattern — not the mere absence of a pattern you would prefer.
A change that matches the surrounding code is not a finding.
Step 2 — Scope (files to analyze):
<one bulleted file path per line>
Step 3 — Return findings as a JSON array with this exact shape, nothing else:
[
{
"file": "src/x.ts",
"line": 42,
"category": "smell|error-handling|test-gap|...",
"severity": "critical|major|minor",
"confidence": "high|medium|low",
"message": "one-line summary",
"suggested_fix": "one-line fix or null"
},
...
]
Do not include prose around the JSON. If you find nothing, return [].
"""
)Security auditor exception. security-reviewer is an agent, not a skill — there is no .claude/skills/security-reviewer/SKILL.md. Dispatch it like the others, but in Step 1 point it at .claude/agents/security-reviewer.md, and have it additionally apply the false-positive filter in .claude/skills/_shared/blocks/security-fp-precedents.md before emitting findings. It still returns the same JSON array (use category values like injection/auth/exposure/config); the agent's own markdown output format does not apply inside the pipeline.
Optional adversarial lens. Pass adversarial (or adversarial:true) in the arguments to add the devils-advocate agent (.claude/agents/devils-advocate.md) as an extra lens that tries to falsify the change rather than confirm it (unstated assumptions, breaking inputs, quiet reinterpretations). Off by default — every other lens checks for goodness; this one pushes back. It returns a ranked objection list; fold its would-ship-broken/risky items into the report as correctness/major findings. Distinct from verify (which knocks down existing findings); adversarial generates new ones.
Wait for all Tasks to complete before continuing. If any single Task errors out, record the failure under "Skipped audits" and continue with the rest — never abort the whole pipeline because one auditor failed.
Phase 4: Dedupe + Confidence Gating
Merge findings using this algorithm:
-
Parse each auditor's JSON into a flat array, tagging each finding with
audit: <auditor-name>. -
Group findings by the tuple
(normalized_file, line_bucket, category)where:normalized_file= repo-relative path with leading./strippedline_bucket=floor(line / 5) * 5— collapses findings within 5 lines of each othercategory= the auditor's category string, lowercased
-
Within each group:
- Merge messages: keep the longest, most specific one as primary; append distinct shorter ones as bullets under "Also noted"
- Compute
provenance= sorted unique list ofaudittags in the group - Bump confidence: if
|provenance| ≥ 2, set group confidence tohighregardless of individual confidences - Max severity wins: group severity = highest of (critical > major > minor)
-
Confidence gate (drop low-signal noise):
- Keep group if
|provenance| ≥ 2— multiple auditors agreed - Else keep group if
confidence = highandseverity ∈ {critical, major} - Else keep group if
confidence = highand the message names a concrete file:line (not a category-only observation) - Else drop — single low/medium-confidence finding is noise
- Keep group if
-
Sort by (severity desc, |provenance| desc, file asc).
Phase 4.5: Adversarial Verification (opt-in)
By default the pipeline trusts each auditor's self-reported confidence (Phase 4 gates structurally). Pass verify (or verify:true) in the arguments to add an independent second pass that tries to knock down each surviving finding before it reaches the report. This is the highest-precision mode and the most token-expensive — it spends one extra subagent per verified finding, so it is off by default.
When verify is set, after Phase 4:
-
Select what to verify. Take surviving findings of severity
criticalormajor(skipminor— low stakes, not worth the spend). Cap at the top 15 by severity; if more survive, verify those 15 and carry the rest through with their Phase 4 status, markedunverifiedin the report. -
Verify each one independently. Dispatch one Task per finding in a single message (parallel), capped at 8 concurrent — drain and refill until done. Each Task gets fresh context with no knowledge of the other findings, so it cannot be anchored by them:
Task( description: "verify <category> finding at <file>:<line>", subagent_type: "general-purpose", prompt: """ Independently assess ONE candidate finding. Reproduce the judgement from the code itself — do not trust the claim. Finding: <file>:<line> · <category> · <severity> Claim: <message> Standard to judge against: - For security categories (injection|auth|exposure|config), read and apply `.claude/skills/_shared/blocks/security-fp-precedents.md`. - Otherwise, read `.claude/skills/<originating-audit>/SKILL.md` and judge against its criteria. Read the cited file and enough surrounding code to trace the real path. Is this a concrete, exploitable/actionable issue — not theoretical, not a style preference, not excluded by the standard above? Return this JSON and nothing else: { "confidence": 1-10, "verdict": "keep" | "drop", "reason": "one line" } """ ) -
Gate on the verifier. Drop any finding the verifier scored below 8. Attach the surviving verifier confidence + reason to each kept finding.
-
Verification is read-only and never modifies code. If a verification Task errors, keep the finding with its Phase 4 confidence and mark it
unverified— never drop a finding because its verifier crashed.
Phase 5: Assemble the Report
Produce one markdown report and offer to save it. Do not fix issues; only report.
Optional save target: tasks/reviews/<YYYY-MM-DD>-<scope-slug>.md. Ask the user before writing if any review file already exists for the same scope today; otherwise just create it and tell them the path.
Output Format
# Review Pipeline Report
**Scope:** <N files | git range | path glob>
**Audits run:** <comma-separated names> · <X> in parallel
**Audits skipped:** <comma-separated, or "none">
**Findings:** <C critical, M major, N minor> after dedupe (<R> raw)
**Verification:** <V verified · D dropped below threshold · U unverified> — include only when `verify` ran
## Cross-Audit Findings (≥2 auditors agreed)
> Highest signal — these are flagged by multiple lenses, treat as high confidence.
| # | Severity | File:Line | Category | Auditors | Issue |
|---|----------|-----------|----------|----------|-------|
| 1 | critical | src/auth.ts:42 | error-handling | quality, testing | ... |
## Single-Audit Findings — Critical
| # | File:Line | Audit | Issue | Suggested Fix |
|---|-----------|-------|-------|---------------|
## Single-Audit Findings — Major
| # | File:Line | Audit | Issue | Suggested Fix |
|---|-----------|-------|-------|---------------|
## Single-Audit Findings — Minor
> Folded by default. Expand only if reviewing thoroughly.
<details>
<summary>N minor findings</summary>
| # | File:Line | Audit | Issue |
|---|-----------|-------|-------|
</details>
## Skipped Audits
| Audit | Reason |
|-------|--------|
| dependency-audit | no dep file in scope |
| performance-audit | parallel cap reached; lower fit than selected 5 |
## Suggested Next Action
1. Address all Cross-Audit Findings first — they have the strongest signal
2. Triage Critical single-audit findings
3. Consider whether any Major findings warrant a follow-up task in `tasks/todo.md`
4. If patterns recur across PRs, promote them to `tasks/lessons/` or a hookReport Guidelines
- File paths with line numbers (
file.ts:42) for every finding — never bare filenames - "Also noted" bullets under a primary finding when auditors phrased the same issue differently
- The Cross-Audit table is the headline — readers scan it first
- Skipped audits get one row each — never silently omit
- Minor findings collapsed into a
<details>block to keep the report scannable - When
verifyran, append the verifier score to each verified finding (e.g.,✓ 9/10) and list what was dropped below threshold under a short "Dropped in verification" note — surfacing what was filtered is part of the signal
Run Mode
This skill supports interactive (default) and headless modes — see the canonical contract in .claude/skills/_shared/blocks/mode-detection.md.
Headless detection: presence of mode:headless in arguments. Other tokens after the flag are treated as explicit audit list (e.g., mode:headless audits:testing,security).
verify / verify:true is a reserved flag token (enables Phase 4.5) — not an audit name. Strip it before parsing the audit list, the same way mode: tokens are stripped.
| Decision point | Interactive default | Headless default |
|---|---|---|
| Empty scope (no changes detected and no path given) | Ask the user which path or range | Fail with "no in-scope changes detected; pass a path explicitly". Never silently scan the whole repo. |
| Adversarial verification (Phase 4.5) | Off unless verify passed | Off unless verify passed — honor it if present |
| Scope >100 files | Ask the user to narrow | Auto-narrow to the top-50 files by churn (git log --name-only count in the diff range). Note the truncation in the report. |
| Audit selection | Apply Phase 2 heuristics and inform user of selection | Apply the same heuristics; honor explicit audits:<list> arg if present |
| Save report path collision (a review for same scope/date already exists) | Ask before overwriting | Append -2, -3, etc. — never overwrite |
| Save vs print only | Offer to save | Always save to tasks/reviews/<YYYY-MM-DD>-<scope-slug>.md and print the path |
Headless review-pipeline is suitable for: scheduled PR sweeps (/loop /review-pipeline mode:headless), CI-side pre-merge gates, and skill-to-skill orchestration where the parent skill chose the scope.
Notes
- This skill orchestrates; it does not analyze. Quality of the report depends on the underlying audit skills.
- Default cap is 5 parallel audits to bound token spend. Override with explicit selection if you need more.
- Confidence bump is asymmetric: agreement promotes to
high, but disagreement does not demote. Two auditors disagreeing on severity → take the higher severity. - The line bucket of 5 is a heuristic for "same issue, different line counted". If you need stricter matching (exact line), say so when invoking; if you need looser (same function), increase the bucket.
- Don't run on the whole repo by default. If scope resolution returns >100 files, ask the user to narrow.
- Save the report only when useful. Quick checks during development don't need a saved file — interactive feedback is the point.
- Adversarial verification (
verify) is opt-in and costs one extra subagent per verified finding. It's bounded (critical/major only, top 15, 8 concurrent) but still the most expensive mode — reach for it on high-stakes diffs (auth, releases) or when someone disputes the findings, not on every run. - Comparative pass (Step 1b) is always on and free — it sharpens every auditor by anchoring findings to deviations from the codebase's own patterns rather than abstract ideals.
Project Health Report
Comprehensive whole-project health report across code quality, architecture, testing, dependencies, and documentation. Use for a breadth-first snapshot of the entire repo.
Quality Audit
Audit the codebase against the project's golden-principles.yaml — runs deterministic detection rules and updates docs/QUALITY_SCORE.md. Use to enforce project-specific principles.