Claude Code Kit

Prompting

The Sycophancy Problem

Agents are designed to follow instructions and please the user. This means:

  • If you say "find a bug", it will find a bug — even if it has to invent one
  • If you say "this code is broken", it will agree and "fix" working code
  • If you say "is this good?", it will say yes more often than it should

This isn't a flaw — it's a design characteristic. Understanding it lets you work with it instead of against it.


Neutral Prompting

Frame tasks so you don't bias the agent toward a predetermined outcome.

Biased (avoid)Neutral (prefer)
"Find the bug in the database module""Walk through the database module logic and report your findings"
"This function is too slow, optimize it""Profile this function's performance and report what you find"
"Is there a security issue here?""Review this code and report anything noteworthy"
"Fix the authentication problem""Trace the authentication flow end-to-end and document how it works"

Why Neutral Works Better

A neutral prompt lets the agent report what it actually finds, which might be:

  • No bugs (good news!)
  • A different bug than you expected
  • A performance characteristic, not a problem
  • A design question, not a defect

A biased prompt forces the agent to produce the expected result, even if the truth is different.


Adversarial Verification Pattern

For high-stakes reviews (security, data integrity, production bugs), use multiple agents with opposing incentives:

The Three-Agent Pattern

┌────────────────┐   ┌────────────────┐   ┌────────────────┐
│    Finder      │   │   Adversary    │   │    Referee     │
│                │   │                │   │                │
│  Goal: Find    │──>│ Goal: Disprove │──>│  Goal: Judge   │
│  all issues    │   │ false positives│   │  accurately    │
└────────────────┘   └────────────────┘   └────────────────┘

Agent 1 — Finder: Incentivized to find issues aggressively.

  • "Review this code for bugs. Score: +1 low impact, +5 medium, +10 critical."
  • Will over-report (including false positives). This is intentional — it casts a wide net.

Agent 2 — Adversary: Incentivized to disprove false positives.

  • "For each bug, prove it's NOT a bug. Score: +points for correct disproval, -2x points if you wrongly dismiss a real bug."
  • Will aggressively challenge findings, but cautiously (penalty for mistakes).

Agent 3 — Referee: Incentivized to be accurate.

  • "Judge each finding. +1 for correct judgment, -1 for incorrect."
  • Has no incentive to lean either way.

When to Use This Pattern

  • Security audits
  • Pre-production code reviews
  • Data migration validation
  • Critical bug hunts

When NOT to Use This Pattern

  • Simple feature implementation
  • Routine code reviews
  • Tasks where a single agent's judgment is sufficient

Working WITH Sycophancy

Sycophancy isn't always a problem — you can leverage it deliberately:

Enthusiastic Exploration

Tell the agent it will be rewarded for thoroughness:

  • "Search exhaustively. Every edge case you find is valuable."
  • The agent will be hyper-thorough, which is exactly what you want for exploration.

Strict Compliance

Tell the agent there are consequences for deviation:

  • "Follow this specification exactly. Any deviation from the spec is a failure."
  • The agent will be rigidly compliant, which is exactly what you want for implementation.

Quality Gates

Set explicit standards it will try to meet:

  • "This code must pass a staff engineer's review. No shortcuts, no stubs, no TODOs."
  • The agent will aim higher because you set the bar explicitly.

Assumption Detection

Agents fill in gaps silently when they don't have enough context. This is where most bad outputs come from.

Signs the Agent Is Making Assumptions

  • It introduces dependencies or patterns you didn't mention
  • It implements features you didn't ask for
  • It makes architectural choices without flagging alternatives
  • It confidently states something that feels like a guess

How to Prevent It

  1. Be specific: More detail in the prompt = fewer gaps to fill
  2. Require flagging: "If you need to make any assumptions, list them before implementing"
  3. Use contracts: Explicit completion criteria leave no room for interpretation
  4. Check after compaction: Context loss forces the agent to assume — re-read rules prevent this

Red Flags in Agent Output

SymptomLikely Cause
Agent "finds" exactly what you asked aboutSycophancy — rephrase neutrally
Agent agrees with your incorrect assessmentSycophancy — state facts, not opinions
Agent implements something you didn't requestGap-filling — be more specific
Agent's output contradicts earlier outputContext bloat — start fresh session
Agent confidently describes non-existent codeHallucination — ask it to show the file:line
Agent says "I've verified" but you see no test runsCompliance theater — require actual commands