Workflow & Planning
Task Lifecycle
Receive Task → Understand → Research → Plan → Confirm → Implement → Verify → DoneEvery task follows this flow. Never skip steps.
Canonical happy-path: /feature-cycle <spec> runs this lifecycle end-to-end as one command — it chains shape-spec → planner → implement → quality gate → /review-pipeline → /ship, carrying context between phases through .hook-state/agent-handoff.md and halting on any gate failure. Use it when you have a shaped spec and want the orchestration in the command rather than in your head; drive the phases by hand when you need finer control.
Goal-Driven Task Reframing
Imperative tasks ("fix the bug", "add validation", "refactor X") are ambiguous in two directions: it's unclear what counts as done, and it's unclear what to do first. Before researching or planning, restate the request as a verifiable goal — something whose completion can be checked deterministically.
This complements ## Verification (Mandatory Order) in CLAUDE.md: verification answers "is the change correct after we built it?"; goal-driven reframing answers "what would a correct change even look like?" before any code is written. Together they bracket the lifecycle.
Inspired by karpathy-skills — transform tasks into verifiable goals.
The transformation table
When the user gives you an imperative, restate it as a verifiable goal before doing anything else:
| User says | Restate as |
|---|---|
| "Add validation to the signup form" | "Write failing tests for the invalid-input cases (empty email, short password, malformed phone, duplicate username). Make the form behavior cause them to pass." |
| "Fix the bug where users can't reset their password" | "Write a regression test that reproduces the failure on main. Make the fix turn it green. Keep the test in the suite." |
"Refactor the OrderService to be cleaner" | "Confirm the current test suite passes. Apply the refactor. Confirm the same suite still passes — same inputs, same outputs, no new assertions added." |
| "Make the dashboard faster" | "Write a measurement: page render to interactive on the slowest dashboard route, on the same hardware, same dataset. Beat that number by X% in the same script." |
| "Improve the error messages" | "Pick the top N error sites by frequency (from logs / sentry). For each, write a test asserting the new message shape. Make them pass." |
| "Document the API" | "List every public endpoint. Generate an OpenAPI doc / Markdown page that covers them. Verify by curl'ing each documented example and matching the response shape." |
| "Clean up the codebase" | Push back. This is not a goal; it's a sentiment. Ask the user to point at the specific smell, file, or pattern they want addressed, and then restate that. |
How to apply
- After reading the request and before loading additional context, write the restated goal in your reply — one short paragraph.
- If a verifiable form genuinely does not exist (pure exploration, design review, advisory), say so explicitly and continue without inventing one.
- If the restated goal is ambiguous or seems wrong, ask one clarifying question before proceeding. Don't ladder up clarifications — pick the highest-impact one.
- The restated goal becomes the acceptance line in
tasks/todo.mdif you go on to write a plan (see "Plan Template" below).
Why this matters
- It surfaces the test/measurement up front, so the rest of the work has a target.
- It deflects sentiment-shaped requests ("make it cleaner") before they balloon into scope drift.
- It makes the end of the task agree with the start of the task —
stop-gate.shchecks verification; this section checks intent.
This is a prompt-side rule (no hook enforces it). The reminder is in CLAUDE.md → Plan First; for a deterministic regression scenario covering the reframing behavior, see bench/scenarios/ (KitBench).
Separate Research from Implementation
This is one of the most impactful practices. When research and implementation happen in the same context, the agent accumulates irrelevant details from alternatives it explored but didn't choose.
The Problem
Bad: "Build an auth system"
→ Agent researches JWT vs sessions vs OAuth vs passkeys
→ Context is now full of implementation details for ALL options
→ By the time it implements JWT, it's confused by OAuth detailsThe Solution
Split into two clean phases:
Research Phase (separate context)
- Explore options, read docs, analyze tradeoffs
- Output: a concise summary of the chosen approach with specific details
- Use a subagent (Explore or Plan) so research doesn't pollute main context
Implementation Phase (clean context)
- Start with the research summary as input, not the full research
- Agent has only the information it needs to build the chosen solution
- No leftover context from rejected alternatives
In Practice
Good: 1. Subagent researches auth options → returns "Use JWT with bcrypt-12,
refresh token rotation, 7-day expiry, httpOnly cookies"
2. Main agent implements ONLY that specificationWhen You Already Know the Approach
If you know what you want, be maximally specific upfront:
Best: "Implement JWT authentication with bcrypt-12 password hashing,
refresh token rotation with 7-day expiry, stored in httpOnly cookies,
using jose library for token operations"The more specific the instruction, the less the agent needs to research, and the better the implementation will be.
When to Write a Plan
Write a plan to tasks/todo.md when:
- Task touches 3+ files
- Architectural decision is needed
- New dependency or tool is introduced
- Workflow or build system changes
- You're unsure about the approach
For small, isolated changes (typo fix, single-function edit): just do it.
Plan Template
Copy this into tasks/todo.md:
## Task: [Short title]
**Goal**: [1 sentence — what does "done" look like?]
**Context**: [Why is this needed? Link to issue/discussion if relevant]
### Approach
1. [Step 1]
2. [Step 2]
3. [Step 3]
### Files to Touch
- `path/to/file.ts` — [what changes]
- `path/to/file.ts` — [what changes]
### Open Questions
- [Anything unclear that needs confirmation?]
### Risks
- [What could go wrong?]
### Not Now
- [Things noticed but out of scope]Feature Spec Folders (Optional)
For multi-file tasks or features that require significant planning, create a timestamped spec folder instead of (or alongside) tasks/todo.md:
tasks/specs/
2026-04-05-user-auth/
plan.md # Implementation plan (what we build)
shape.md # Decisions and context from planning
references.md # Pointers to similar code, docs, examplesWhen to use spec folders
- Multi-session features that need persistent context
- Tasks where planning decisions should be preserved for future reference
- Features with significant research or tradeoff analysis
When NOT to use spec folders
- Simple tasks that fit in
tasks/todo.md - Bug fixes or single-file changes
- Tasks that will be completed in one session
Spec folders survive sessions and serve as handoff context. A new session can read the spec folder to understand what was planned and why.
Mid-Task Recovery
If something goes sideways:
- STOP — don't push through
- Re-read the original goal
- Check if scope has drifted
- Update the plan in
tasks/todo.md - Get confirmation before continuing
The most common failure mode is scope creep disguised as "while I'm here."
Session Strategy
One Session Per Contract (recommended)
Long-running sessions (24+ hours, many tasks) degrade over time because unrelated context from earlier tasks pollutes later ones. The agent starts making assumptions based on code it saw 3 tasks ago.
Instead: Start a new session for each task contract.
Session 1: CONTRACT_auth.md → implements auth → ends
Session 2: CONTRACT_uploads.md → implements uploads → ends
Session 3: CONTRACT_search.md → implements search → endsEach session reads only:
CODEBASE_MAP.md+CLAUDE.project.md(project awareness — Tier 1)tasks/lessons/_index.md→## Top Rulessection only (Tier 3, on-demand). Individual lesson files intasks/lessons/loaded only when relevant- Its own contract file (task scope)
- The specific source files it needs to edit
Result: Clean context, no cross-contamination, deterministic scope.
When Long Sessions Are OK
- Exploratory work where you're interactively guiding the agent
- Debugging a single complex issue with many layers
- Sessions where YOU are the orchestrator (you manage the context)
Orchestration Layer
For automated multi-contract workflows:
Orchestrator (you or a script)
├── Generates CONTRACT_1.md
├── Starts Session 1 → completes → checks contract
├── Generates CONTRACT_2.md (may depend on 1's output)
├── Starts Session 2 → completes → checks contract
└── ... continues until all contracts fulfilledThis can be as simple as a shell script that:
- Creates a contract file
- Runs
claude --print "Complete the contract in tasks/CONTRACT_X.md" - Verifies the contract criteria
- Repeats for the next contract
Context Hygiene
The ## After Compaction rule in CLAUDE.md is reactive — what to do once compaction has happened. This section is proactive — when to evict context yourself before quality degrades.
Claude Code exposes two commands:
/compact— summarises the conversation so far and replaces it with the summary. Keeps the thread alive; trades fidelity for headroom./clear— wipes the conversation entirely.CLAUDE.mdand project files are re-read on next turn; mid-task scratch state is gone.
When to use which
| Trigger | Use |
|---|---|
| Just finished a contract / feature, starting a new one | /clear |
| Switching branches mid-session | /clear |
| Conversation is long but you're still on the same task and need the history | /compact |
| Responses are getting slower, more drift, more "I forgot what we decided" | /compact (or /clear if the task has natural boundaries) |
| Context window indicator past ~70-80% | /compact now, don't wait for the auto-trigger |
| Switching from Plan phase to Implement phase on a non-trivial task | /clear, then start Implement with the plan file as the only input |
Suggested budgets
Rough orientation numbers, not caps the agent must enforce:
| Scope | Target |
|---|---|
| Single task | ~4k tokens of conversation |
| Single session | ~30k tokens before /compact |
Past ~30k tokens you start seeing the classic symptom: the agent re-suggests a fix you rejected 20 turns earlier. That's the actual cue, not the number. The numbers exist so "should I /compact now?" has a deterministic answer instead of vibes.
Rules
- Before
/clear, make sure the things you need next session are written down —tasks/todo.md,tasks/handoff-*.md, an opentasks/specs/<slug>/folder, or a fresh ADR intasks/decisions.md. The clear is destructive for in-memory state, not for files. - After
/compact, re-read the same Tier 1 files as## After CompactioninCLAUDE.md— a summary is not the same as the original context. - Don't
/compactto "save tokens" if the task is small. The summary itself costs context; only useful when the conversation has grown. - If you find yourself wanting to compact every 20 turns, the real fix is
## Separate Research from Implementationabove — too much exploration is bleeding into the active task.
This complements ## Session Strategy → One Session Per Contract — that section says "start new sessions per task"; this section says "and inside a session, prune as you go."
Visual Feedback
For UI work, a screenshot is faster and less ambiguous than describing a problem in words. Claude Code accepts pasted images directly into the prompt.
The pattern
- Capture the offending region —
Cmd+Shift+4on macOS,Win+Shift+Son Windows,Shift+PrintScreenon most Linux DEs. Capture the smallest area that contains the problem, not the whole window. - Paste into the Claude Code prompt with
Ctrl+V(orCmd+V). The image attaches to the next message. - Pair the image with a specific instruction. The image shows what; you still need to say what to change.
What to say with the image
| Bad (ambiguous) | Good (specific) |
|---|---|
| "This looks wrong, fix it." | "The spacing between the avatar and the username is too tight — match the 12px gap used elsewhere in this list." |
| "Make this better." | "The button's border-radius doesn't match the card it sits inside. Card is 8px, button is 4px. Align them." |
| "Why is this broken?" | "This dropdown overflows the viewport on mobile. Reproduce the layout in CSS and propose a fix that keeps the menu inside the safe area." |
When to use it
- Spacing, alignment, contrast, or visual hierarchy issues — words struggle, images don't
- Cross-browser / cross-viewport differences — capture both, paste both, compare
- "AI slop" detection — paste the rendered output alongside
DESIGN.mdand ask "where does this diverge from the design system?" - Error states in browser devtools — paste the console / network panel screenshot instead of typing the stack trace
When not to use it
- Code problems — text is canonical, screenshots of code lose structure
- Logic bugs — the visual output is downstream; show the data or the failing test instead
- Anything reproducible by
npm test— let verification do the work
Pairs with the /design-review skill: that skill produces a structured report against DESIGN.md; screenshot feedback is the lightweight, in-loop counterpart for during implementation.
PR / Commit Workflow
Commit Messages
<type>: <short description>
<optional body — why, not what>Types: feat, fix, refactor, test, docs, chore, perf, ci, build, style
Good:
feat: add rate limiting to /api/upload
Prevents abuse from unauthenticated clients.
Limit: 10 req/min per IP.Bad:
update stuff
fix things
WIPPR Checklist
Before marking a task done:
- All changes match the original plan
- No unrelated changes snuck in
- Typecheck passes
- Lint passes
- Tests pass
- Smoke tested (manually verified real behavior)
-
tasks/todo.mdupdated (task marked done or removed)
Scope Control
What "scope discipline" means in practice
| Situation | Do | Don't |
|---|---|---|
| See a typo in an unrelated file | Log it in Not Now | Fix it now |
| Function could be cleaner | Log it in Not Now | Refactor it |
| Test is flaky but unrelated | Log it in Not Now | Debug it |
| Dependency could be updated | Log it in Not Now | Update it |
| Better pattern exists for old code | Log it in Not Now | Rewrite it |
The "Not Now" List
In tasks/todo.md, keep a ## Not Now section:
## Not Now
- [ ] `src/utils/format.ts` has a potential timezone bug
- [ ] `package.json` — lodash could be replaced with native methods
- [ ] Auth middleware doesn't handle token refresh edge caseThese get addressed in dedicated cleanup sessions, not mid-feature.