Claude Code Kit

Review Pipeline

Run multiple audits in parallel over a PR-scope diff, dedupe across auditors, and produce a confidence-gated report. Use to review a change or PR before merge.

Core Rule

Parallelize independent auditors over a PR-scope diff. Dedupe across reports. Never widen scope beyond the diff.

Kit Context

Before starting this skill, ensure you have completed session boot:

  1. Read CODEBASE_MAP.md for project understanding
  2. Read CLAUDE.project.md if it exists for project-specific rules
  3. Read tasks/lessons/_index.md for accumulated corrections (Top Rules + index)

If any of these haven't been read in this session, read them now before proceeding.

When to Use

Invoke with /review-pipeline when:

  • A PR is ready and you want a multi-lens check before requesting human review
  • A multi-session feature has landed on a branch and you want to catch regressions
  • You're investigating a flaky area and want several auditors to weigh in
  • You suspect issues but don't know which audit applies — let the pipeline pick

Not for:

  • Trivial single-file edits — invoke the relevant audit directly
  • Project-wide periodic health checks — use /project-health-report instead
  • Full architectural assessments — use /architecture-review or /deepening-review

Scope Rules

  • Read-only — produces a report, never modifies code
  • Operates within the detected scope (recent changes by default)
  • Logs unrelated issues found during analysis under tasks/todo.md > ## Not Now
  • Does not duplicate findings already in CLAUDE.md or tasks/lessons/

Process

Phase 1: Resolve Scope

Pick the smallest scope that captures the change:

  1. If the user passed paths or globs — use those literally
  2. Else if the working tree has uncommitted changes — use git diff --name-only HEAD
  3. Else if on a feature branch — use git diff --name-only $(git merge-base HEAD origin/main)...HEAD (fall back to main if origin/main is absent)
  4. Else — ask the user which path or commit range to review; do not silently default to the whole repo

Filter the file list:

  • Drop generated paths (node_modules/, dist/, build/, .next/, __pycache__/, vendor/)
  • Drop lockfiles unless dependency-audit is selected
  • If the resulting list is empty, stop and report "no in-scope changes detected"

Save the resolved scope as a bulleted list — every parallel auditor gets the same list.

Phase 2: Select Audits

Always select these core audits:

  • code-quality-audit — smells, error handling, maintainability
  • testing-audit — coverage gaps, test quality

Conditionally add based on file patterns in the resolved scope:

Pattern in scopeAdd audit
*.tsx, *.jsx, *.vue, *.svelte, *.htmlaccessibility-audit
Service / handler / query / loop heavy code (≥1 file)performance-audit
package.json, pyproject.toml, requirements*.txt, Cargo.toml, go.moddependency-audit
Module-level or full-folder change (≥5 files in one dir)dead-code-audit
README*, docs/, public API surface changesdocumentation-audit
Auth, session, token, crypto, request validation pathssecurity-reviewer (agent, not skill)
Multi-module structural changearchitecture-review

Cap at 5 parallel audits by default to keep token spend bounded. If more are eligible, pick the 5 with the strongest fit to the scope and list the skipped ones in the report's "Skipped audits" section with a one-line rationale.

If the user passed an explicit list (e.g., /review-pipeline testing,security), honor it literally and skip selection logic.

Phase 3: Run Audits in Parallel

For each selected audit, dispatch one Task in a single message containing all tool calls so they run concurrently. Pattern per Task:

Task(
  description: "<audit-name> on <short scope label>",
  subagent_type: "general-purpose",
  prompt: """
    You are running the <audit-name> audit on a scoped set of files.

    Step 1 — Read the skill file at `.claude/skills/<audit-name>/SKILL.md` and follow
    its Process section. Do not invoke other audits; do not analyze files outside
    the scope below.

    Step 1b — Comparative pass: sample how the codebase already handles this concern
    (neighbouring files, shared helpers, established conventions). Flag *deviations*
    from the existing pattern — not the mere absence of a pattern you would prefer.
    A change that matches the surrounding code is not a finding.

    Step 2 — Scope (files to analyze):
    <one bulleted file path per line>

    Step 3 — Return findings as a JSON array with this exact shape, nothing else:
    [
      {
        "file": "src/x.ts",
        "line": 42,
        "category": "smell|error-handling|test-gap|...",
        "severity": "critical|major|minor",
        "confidence": "high|medium|low",
        "message": "one-line summary",
        "suggested_fix": "one-line fix or null"
      },
      ...
    ]

    Do not include prose around the JSON. If you find nothing, return [].
  """
)

Security auditor exception. security-reviewer is an agent, not a skill — there is no .claude/skills/security-reviewer/SKILL.md. Dispatch it like the others, but in Step 1 point it at .claude/agents/security-reviewer.md, and have it additionally apply the false-positive filter in .claude/skills/_shared/blocks/security-fp-precedents.md before emitting findings. It still returns the same JSON array (use category values like injection/auth/exposure/config); the agent's own markdown output format does not apply inside the pipeline.

Optional adversarial lens. Pass adversarial (or adversarial:true) in the arguments to add the devils-advocate agent (.claude/agents/devils-advocate.md) as an extra lens that tries to falsify the change rather than confirm it (unstated assumptions, breaking inputs, quiet reinterpretations). Off by default — every other lens checks for goodness; this one pushes back. It returns a ranked objection list; fold its would-ship-broken/risky items into the report as correctness/major findings. Distinct from verify (which knocks down existing findings); adversarial generates new ones.

Wait for all Tasks to complete before continuing. If any single Task errors out, record the failure under "Skipped audits" and continue with the rest — never abort the whole pipeline because one auditor failed.

Phase 4: Dedupe + Confidence Gating

Merge findings using this algorithm:

  1. Parse each auditor's JSON into a flat array, tagging each finding with audit: <auditor-name>.

  2. Group findings by the tuple (normalized_file, line_bucket, category) where:

    • normalized_file = repo-relative path with leading ./ stripped
    • line_bucket = floor(line / 5) * 5 — collapses findings within 5 lines of each other
    • category = the auditor's category string, lowercased
  3. Within each group:

    • Merge messages: keep the longest, most specific one as primary; append distinct shorter ones as bullets under "Also noted"
    • Compute provenance = sorted unique list of audit tags in the group
    • Bump confidence: if |provenance| ≥ 2, set group confidence to high regardless of individual confidences
    • Max severity wins: group severity = highest of (critical > major > minor)
  4. Confidence gate (drop low-signal noise):

    • Keep group if |provenance| ≥ 2 — multiple auditors agreed
    • Else keep group if confidence = high and severity ∈ {critical, major}
    • Else keep group if confidence = high and the message names a concrete file:line (not a category-only observation)
    • Else drop — single low/medium-confidence finding is noise
  5. Sort by (severity desc, |provenance| desc, file asc).

Phase 4.5: Adversarial Verification (opt-in)

By default the pipeline trusts each auditor's self-reported confidence (Phase 4 gates structurally). Pass verify (or verify:true) in the arguments to add an independent second pass that tries to knock down each surviving finding before it reaches the report. This is the highest-precision mode and the most token-expensive — it spends one extra subagent per verified finding, so it is off by default.

When verify is set, after Phase 4:

  1. Select what to verify. Take surviving findings of severity critical or major (skip minor — low stakes, not worth the spend). Cap at the top 15 by severity; if more survive, verify those 15 and carry the rest through with their Phase 4 status, marked unverified in the report.

  2. Verify each one independently. Dispatch one Task per finding in a single message (parallel), capped at 8 concurrent — drain and refill until done. Each Task gets fresh context with no knowledge of the other findings, so it cannot be anchored by them:

    Task(
      description: "verify <category> finding at <file>:<line>",
      subagent_type: "general-purpose",
      prompt: """
        Independently assess ONE candidate finding. Reproduce the judgement from the
        code itself — do not trust the claim.
    
        Finding: <file>:<line> · <category> · <severity>
        Claim: <message>
    
        Standard to judge against:
        - For security categories (injection|auth|exposure|config), read and apply
          `.claude/skills/_shared/blocks/security-fp-precedents.md`.
        - Otherwise, read `.claude/skills/<originating-audit>/SKILL.md` and judge
          against its criteria.
    
        Read the cited file and enough surrounding code to trace the real path. Is
        this a concrete, exploitable/actionable issue — not theoretical, not a style
        preference, not excluded by the standard above?
    
        Return this JSON and nothing else:
        { "confidence": 1-10, "verdict": "keep" | "drop", "reason": "one line" }
      """
    )
  3. Gate on the verifier. Drop any finding the verifier scored below 8. Attach the surviving verifier confidence + reason to each kept finding.

  4. Verification is read-only and never modifies code. If a verification Task errors, keep the finding with its Phase 4 confidence and mark it unverified — never drop a finding because its verifier crashed.

Phase 5: Assemble the Report

Produce one markdown report and offer to save it. Do not fix issues; only report.

Optional save target: tasks/reviews/<YYYY-MM-DD>-<scope-slug>.md. Ask the user before writing if any review file already exists for the same scope today; otherwise just create it and tell them the path.

Output Format

# Review Pipeline Report

**Scope:** <N files | git range | path glob>
**Audits run:** <comma-separated names> · <X> in parallel
**Audits skipped:** <comma-separated, or "none">
**Findings:** <C critical, M major, N minor> after dedupe (<R> raw)
**Verification:** <V verified · D dropped below threshold · U unverified> — include only when `verify` ran


## Cross-Audit Findings (≥2 auditors agreed)

> Highest signal — these are flagged by multiple lenses, treat as high confidence.

| # | Severity | File:Line | Category | Auditors | Issue |
|---|----------|-----------|----------|----------|-------|
| 1 | critical | src/auth.ts:42 | error-handling | quality, testing | ... |

## Single-Audit Findings — Critical

| # | File:Line | Audit | Issue | Suggested Fix |
|---|-----------|-------|-------|---------------|

## Single-Audit Findings — Major

| # | File:Line | Audit | Issue | Suggested Fix |
|---|-----------|-------|-------|---------------|

## Single-Audit Findings — Minor

> Folded by default. Expand only if reviewing thoroughly.

<details>
<summary>N minor findings</summary>

| # | File:Line | Audit | Issue |
|---|-----------|-------|-------|

</details>

## Skipped Audits

| Audit | Reason |
|-------|--------|
| dependency-audit | no dep file in scope |
| performance-audit | parallel cap reached; lower fit than selected 5 |

## Suggested Next Action

1. Address all Cross-Audit Findings first — they have the strongest signal
2. Triage Critical single-audit findings
3. Consider whether any Major findings warrant a follow-up task in `tasks/todo.md`
4. If patterns recur across PRs, promote them to `tasks/lessons/` or a hook

Report Guidelines

  • File paths with line numbers (file.ts:42) for every finding — never bare filenames
  • "Also noted" bullets under a primary finding when auditors phrased the same issue differently
  • The Cross-Audit table is the headline — readers scan it first
  • Skipped audits get one row each — never silently omit
  • Minor findings collapsed into a <details> block to keep the report scannable
  • When verify ran, append the verifier score to each verified finding (e.g., ✓ 9/10) and list what was dropped below threshold under a short "Dropped in verification" note — surfacing what was filtered is part of the signal

Run Mode

This skill supports interactive (default) and headless modes — see the canonical contract in .claude/skills/_shared/blocks/mode-detection.md.

Headless detection: presence of mode:headless in arguments. Other tokens after the flag are treated as explicit audit list (e.g., mode:headless audits:testing,security).

verify / verify:true is a reserved flag token (enables Phase 4.5) — not an audit name. Strip it before parsing the audit list, the same way mode: tokens are stripped.

Decision pointInteractive defaultHeadless default
Empty scope (no changes detected and no path given)Ask the user which path or rangeFail with "no in-scope changes detected; pass a path explicitly". Never silently scan the whole repo.
Adversarial verification (Phase 4.5)Off unless verify passedOff unless verify passed — honor it if present
Scope >100 filesAsk the user to narrowAuto-narrow to the top-50 files by churn (git log --name-only count in the diff range). Note the truncation in the report.
Audit selectionApply Phase 2 heuristics and inform user of selectionApply the same heuristics; honor explicit audits:<list> arg if present
Save report path collision (a review for same scope/date already exists)Ask before overwritingAppend -2, -3, etc. — never overwrite
Save vs print onlyOffer to saveAlways save to tasks/reviews/<YYYY-MM-DD>-<scope-slug>.md and print the path

Headless review-pipeline is suitable for: scheduled PR sweeps (/loop /review-pipeline mode:headless), CI-side pre-merge gates, and skill-to-skill orchestration where the parent skill chose the scope.

Notes

  • This skill orchestrates; it does not analyze. Quality of the report depends on the underlying audit skills.
  • Default cap is 5 parallel audits to bound token spend. Override with explicit selection if you need more.
  • Confidence bump is asymmetric: agreement promotes to high, but disagreement does not demote. Two auditors disagreeing on severity → take the higher severity.
  • The line bucket of 5 is a heuristic for "same issue, different line counted". If you need stricter matching (exact line), say so when invoking; if you need looser (same function), increase the bucket.
  • Don't run on the whole repo by default. If scope resolution returns >100 files, ask the user to narrow.
  • Save the report only when useful. Quick checks during development don't need a saved file — interactive feedback is the point.
  • Adversarial verification (verify) is opt-in and costs one extra subagent per verified finding. It's bounded (critical/major only, top 15, 8 concurrent) but still the most expensive mode — reach for it on high-stakes diffs (auth, releases) or when someone disputes the findings, not on every run.
  • Comparative pass (Step 1b) is always on and free — it sharpens every auditor by anchoring findings to deviations from the codebase's own patterns rather than abstract ideals.