Why most prompts fail
Vague prompts produce vague output. That isn't a flaw in the model — it's a predictable consequence of ambiguity. When you tell a model to "write a summary" without specifying the audience, length, format, or what counts as important, the model fills in every blank. It fills them in differently each time, which is why the same request gives you different results on Monday versus Friday.
The issue isn't the model's intelligence — modern models are remarkably capable. The issue is that the model can only work with what you give it. Ambiguity in, ambiguity out. The more precisely you specify what you want, who it's for, what shape the output should take, and what to do when things go sideways, the more predictable — and more useful — the response becomes.
Here's the shift that changes everything: stop thinking of prompts as one-off requests and start treating them as reusable assets. A well-engineered prompt for your weekly status summary, your client email drafts, or your data extraction pipeline runs the same way every time. Same input, same output. That's when prompts start compounding — you build a library, not a habit of re-explaining yourself.
"A prompt is a contract with the model. The clearer the contract, the more reliable the result."
The CRAFT Framework
RUDI's CRAFT framework gives you a checklist for every prompt. Five letters, five decisions — make them deliberately and your output sharpens immediately.
What background does the model need? What's been tried? Who's the audience? Without context, the model defaults to generic.
"Senior security engineer reviewing a PR" produces different output than "code reviewer." Roles prime vocabulary and judgment.
What exactly should the model do? Verb + object + for-whom. "Summarize" is vague; "extract decisions and action items for a Slack post" is actionable.
How should the output be structured? Bullets / JSON / sections / length / fields. If you don't specify, the model picks — and it picks differently every time.
Voice, register, formality. "For a CFO in 30 seconds" vs "for a teammate over coffee" — same content, different prompt.
CRAFT covers the five decisions you have to make. The next section shows where they sit inside the full anatomy of a prompt.
The anatomy of a prompt
Every effective prompt has seven slots. CRAFT covers five of them — these two extra slots are what separate good prompts from production-grade ones.
| Slot | What it does | CRAFT |
|---|---|---|
| Role | Who the model is playing | R |
| Task | The explicit directive | A |
| Context | Why this matters, constraints, audience | C |
| Input | The actual material being operated on — the document, data, or text the model works with | + add |
| Examples | Input → output demonstrations that lock the format and style | + add |
| Output Format | Shape, length, schema | F |
| Tone | Voice and register | T |
Input (delimited content) is the piece most people forget to make explicit. When your instructions and your data look the same — both just plain text in a row — the model can lose track of which is which. Wrapping your input in XML tags, triple backticks, or a --- block creates a clear boundary. This matters most when the input is long, when it contains instructions of its own (like a pasted email), or when you're chaining multiple steps.
Examples are the single most reliable way to lock format. Five paragraphs describing your desired output format are still description. One concrete input-to-output pair is a demonstration. Models learn from examples at inference time the same way people learn from seeing a template. If your desired output is anything other than plain prose — structured data, a specific writing style, an unusual schema — include at least one example and watch consistency improve immediately.
The 7 levers that actually improve output
These are the techniques that move the needle most, in order of impact. Each one closes a specific gap that leaves the model guessing.
Replace adjectives with measurements. "Concise" means nothing — "≤80 words" means everything. "Professional" is subjective; "the voice of a McKinsey deck" is calibrated. "A good summary" is a wish; "3 bullets, decisions-only, for a CFO with 30 seconds" is a contract.
"Write a concise summary of this meeting."
"Write a 3-bullet summary of this meeting transcript. Focus only on decisions made. Each bullet ≤20 words. For a CFO who has 30 seconds."
Separate your instructions from your data. The model gets confused when they look the same. Use XML tags, triple backticks, or --- blocks. Pick one style and use it consistently across all your prompts — it builds a habit the model can recognize.
"Summarize this email: Hi team, just wanted to follow up on..."
"Summarize the email below. <email> Hi team, just wanted to follow up on... </email>"
One concrete example beats five paragraphs of description. Show the shape you want, don't describe it. This is critical for unusual formats, specific styles, or structured extraction — any time description alone isn't locking the output to the shape you need.
"Extract the action items from this transcript in a useful format."
"Extract action items in this exact format: - [ ] [action] — @[owner] — [due date or 'TBD'] Example: - [ ] Send Q3 deck — @sarah — Friday"
Specify the exact shape: JSON schema, field names, types, length, allowed values. Then add a fallback for when the model can't comply. Without a fallback, the model invents an answer rather than admitting uncertainty — and the invention looks indistinguishable from the real thing.
"Tell me what's in this contract."
"Extract these fields from the contract as JSON:
{parties: string[], effective_date: string, term_months: number, termination_clause: string}
If a field is missing, use null.
If the document isn't a contract, respond with:
{error: 'not a contract'}"
A role primes the model's vocabulary, judgment, and frame of reference. It's not decoration — it's instruction. "You are a senior infosec engineer reviewing this code" produces fundamentally different output than "review this code," because the role carries implicit knowledge about what matters.
"Review this Python function."
"You are a senior backend engineer reviewing a junior teammate's pull request. Be direct but constructive. Focus on: correctness, readability, error handling. Skip style issues a linter would catch."
For multi-step problems, ask the model to think before answering. "Think step by step before responding" works on most models. On Claude's extended thinking mode or OpenAI's o-series reasoning models, skip this — they already reason internally, and explicit chain-of-thought instructions can actually hurt performance.
"What's the best pricing strategy for our SaaS product?"
"Think step by step before answering: 1. What pricing models are common in SaaS? 2. Which fits a tool with our usage profile [see below]? 3. What are the trade-offs? Then recommend ONE model with reasoning."
For anything factual, paste the reference text inside the prompt and constrain the model to it. This is the single highest-impact move for reducing hallucination. Without grounding, the model draws on training data that may be wrong, outdated, or simply absent for your specific context.
"What does our refund policy say about international orders?"
"Refund policy: <policy> [paste full policy here] </policy> Question: What does the policy say about international orders? Answer using ONLY the text above. If it isn't covered, respond: 'Not specified in current policy.'"
Ready-to-use templates
Six skeletons for the tasks people prompt for most. Paste, fill in the brackets, and you're 80% of the way there.
1 — Summarize (meeting / doc / email)
You are a [audience-appropriate role, e.g., "executive assistant"].
Summarize the content below for [specific audience: "the engineering team in a Slack post"].
<content>
{{paste here}}
</content>
Produce exactly:
**Key points** — 3 bullets, ≤20 words each
**Action items** — bulleted, each with owner and due date (or "TBD")
**Decisions** — bulleted, or "_None_"
Skip pleasantries. Only include items explicitly stated.
2 — Extract structured data
Extract the following fields from the text below as JSON.
Fields:
- [field_name]: [type] — [description]
- [field_name]: [type] — [description]
<text>
{{paste here}}
</text>
Rules:
- Return ONLY valid JSON, no commentary
- Use null for missing fields
- If the text doesn't match the expected type, respond with:
{"error": "input does not match expected format"}
3 — Draft (email / message / copy)
You are writing on behalf of [sender]. You are [relationship to recipient].
Write [type, e.g., "an email"] to [recipient] about [topic].
Tone: [specific: "warm but direct", "formal", "casual peer-to-peer"]
Length: [bounded: "≤150 words"]
Goal: [what should happen after they read it]
Constraints:
- Do NOT [common pitfalls: "apologize", "use corporate jargon", "include emojis"]
- DO [must-haves: "include a clear next step", "reference our last conversation"]
Context they need:
{{any background}}
4 — Decide (compare options & recommend)
You are advising me as [role: "a strategic consultant", "a senior engineer"].
I'm deciding between these options:
A) [option A]
B) [option B]
C) [option C]
My situation:
{{relevant context, constraints, goals}}
Walk through:
1. The decisive trade-offs (not all of them — just the ones that matter here)
2. Your recommendation with reasoning
3. The conditions under which you'd change your recommendation
Be direct. Skip the diplomatic hedging.
5 — Critique (feedback on work)
You are a [domain expert, e.g., "senior product designer"] giving honest feedback to a colleague.
Review the [artifact, e.g., "wireframe", "draft", "code"] below.
<artifact>
{{paste here}}
</artifact>
Structure your feedback as:
**What's working** — 2-3 specific strengths
**What's not** — 2-3 specific issues, with WHY each matters
**Highest-leverage change** — the single edit that would most improve it
Be specific. "This is unclear" is useless — say what's unclear and why.
6 — Teach / Explain
You are explaining [topic] to [audience: "a smart non-expert", "a new hire on the team"]. Their starting knowledge: [what they already know] What they need to be able to do after: [concrete outcome] Structure: 1. The one-sentence version 2. The mental model (analogy or core concept) 3. The detail they need (only what serves the outcome — skip everything else) 4. A worked example 5. What's NOT covered (so they know the limits) Keep it under [length]. Use plain language. No jargon without immediate definition.
Tips by AI model
The universal anatomy works everywhere, but each major model has quirks worth knowing.
- Loves XML tags for structure (
<instructions>,<context>,<document>) more than other models - For long documents, put the document FIRST and your question LAST — Anthropic cites ~30% quality improvement with this ordering
- Use the extended thinking / effort budget for hard reasoning instead of writing "think step by step"
- You can prefill the assistant's response to lock format — useful for guaranteed JSON or structured output starts
- Strong with structured outputs — use the JSON schema feature when available rather than describing the shape in prose
- For o-series reasoning models (o1, o3): skip chain-of-thought prompts — they already reason internally and explicit CoT can hurt
- Repeat critical instructions at the END of long prompts — recency bias is real and well-documented
- Build a small eval set early — OpenAI's own docs treat systematic testing as core to prompt engineering, not optional
- Pay attention to sampling parameters (temperature, top-K) — Google treats them as part of prompt design, not an afterthought
- Try "step-back prompting": ask a more abstract version of your question first, then apply the answer to the specific case
- For high-stakes reasoning, run the same prompt multiple times and take the majority answer — self-consistency works well here
7 mistakes that quietly ruin prompts
Anti-patterns we see in nearly every prompt that isn't working.
- Vague adjectives instead of measurements. "Concise," "professional," "good" — these tell the model nothing. Replace with numbers, examples, or named comparisons.
- No examples for novel formats. If the format is anything other than plain prose, show one example. Description is not enough — demonstration is.
- No fallback for "I don't know." Without an out, the model invents an answer. Always include: "If you can't, respond with exactly: [phrase]."
- Asking for five things in one prompt. Models get worse with each additional task in a single shot. Split into chained prompts or accept reduced quality on all five.
- Burying instructions in long context. Recency bias is real — the model weights the end of the prompt more heavily. Restate critical instructions at the bottom.
- No negative space. "Do NOT explain your reasoning." "Do NOT apologize." "Do NOT include caveats." Models over-help by default — explicitly say what you don't want.
- Iterating on vibes, not evals. "Better on the one I tried" is how prompts silently regress. Build a 10-example test set and run all 10 every time you tweak.
Frequently asked questions
What is the best prompt framework?
What's the difference between a system prompt and a user prompt?
How is prompting Claude different from prompting GPT?
<instructions>, <document>). For long context, Claude prefers the document at the top and the question at the bottom. GPT's o-series reasoning models actually do worse with explicit "think step by step" instructions — they already reason internally. Gemini benefits from "step-back prompting" (asking an abstract version of the question first). Universal principles work everywhere; these are optimizations on top of that foundation.Do I need examples in every prompt?
How long should a prompt be?
Why isn't my prompt working?
What's the difference between zero-shot and few-shot prompting?
How do I prevent the AI from making things up?
Take it further
Reading is a start. Practice is the multiplier.
Prompt Engineering Workshop
Hands-on training where your team builds a prompt library for their actual work. Half-day or full-day format, tailored to your use cases and tools.
See training programs →Camp Claude Cohort
An immersive cohort program where you walk away with a production-grade prompt kit, an AI workflow library, and a measurable productivity baseline.
Learn about Camp Claude →Or read more in our guide for people leaders.