
Context Engineering Can Fall Short: How Micro-Corrections Actually Drive AI Success
Since I started coding with Claude Code back in May, I've had this overwhelming feeling I can't shake: I have superpowers.
It's created this weird, wonderful FOMO. Like, if you woke up as Spider-Man, wouldn't it be a waste to go by your day as if nothing happened? Shipping at an order of magnitude faster, effortlessly? Sign me up. I've been coding at an almost comical pace. Here's the thing: I've spent a decade trying to help open source win in analytics. Now suddenly I can build this fast? That's not a productivity boost, that's a cheat code for my life's mission.
Meanwhile, Twitter/Reddit is somewhat divided on agentic coding and peers reports mixed results. Same tools, wildly different results.
So I went on a quest: Why do I consistently get these superhuman results while many others seem to be hitting much more speed bumps? Is it my 10 years on this codebase? My prompting style? My CLAUDE.local.md
?
After analyzing 5,830 of my own prompts and running experiments, ruling out many hypothesis, I zeroed-in on something nobody talks about: AI models are ridiculously suggestible, and we're all unconsciously steering them with micro-signals. Context engineering gives you the map, but these tiny nudges are what keep the car on the road.
The Quest Begins
I started with the usual suspects. Maybe it was my seniority? The tools I used? My prompt templates?
Then I ran the numbers: 5,830 prompts analyzed across ~6 months of heavy AI pair-programming. The results surprised me:
- My prompts averaged just 93 characters (shorter than a tweet)
- I use "ok" as a transition word 324 times
- My style is casual to the point of being sloppy ("yo!", "dammit!")
- Lots of question marks, 32% of my prompts contain at least one question mark
The puzzle deepened: How does casual brevity outperform the carefully crafted, context-rich prompts that all the guides recommend?
The Driving Analogy - how does steering work?
Think about driving to a clear destination, but with an imperfect map and a brilliant but probabilistic copilot. Your map (context engineering) might have the route, but reality keeps changing:
- Sudden detour you didn't expect
- Traffic jam ahead
- Construction zones
- Yellow light timing you need to judge
- Lane drifting that needs constant correction
You're ALWAYS making tiny steering corrections. Even with perfect directions, driving means constantly adjusting for reality. One delayed correction and you're in the ditch—or in AI's case, implementing the wrong solution with absolute confidence.
The revelation hit: My "ok" and "wait" aren't context—they're steering corrections. They're the micro-adjustments that keep the AI from confidently driving off-road.
The BDFL Factor
Here's the uncomfortable truth from my analysis: I think my success comes from domain knowledge, not my prompting / communication style (duh.).
Watch what happens with the same request, different confidence levels:
- Me: "Fix the webpack config" (10 years of knowing exactly where webpack issues hide)
- Unfamiliar: "Maybe check webpack?" (genuinely uncertain)
The AI receives the same GPS coordinates, but exhibits totally different behavior. When I say "probably," the model hears "80% confidence, weight this heavily." When a junior says "probably," they mean "50-50 guess."
The model can't tell the difference—it just follows the confidence gradient.
How Suggestible Are AI Models, Really?
Let me blow your mind with a simple experiment.
The Spaghetti Sauce Test
Try these two prompts with the same recipe:
- "This spaghetti sauce recipe looks off, WDYT?"
- "Oh my god this spaghetti sauce recipe looks delish—WDYT?"
First prompt: The AI will find problems with the acidity balance, suggest adding sugar, question the garlic amount.
Second prompt: The AI will praise the flavor profile, maybe suggest minor garnish improvements.
Same recipe. Your bias becomes the model's bias.
The Code Review Test
- "This code feels wrong" → AI finds 10 issues, suggests refactoring
- "This code looks solid" → AI finds minor formatting issues, praises structure
The model isn't evaluating objectively—it's following your confidence signals like a dowsing rod following water.
The Micro-Correction Taxonomy
After analyzing my 5,830 prompts, I discovered I'm unconsciously using a consistent correction language:
Signal | Count | Driving Equivalent | Model Interpretation |
---|---|---|---|
"ok" | 324 | Maintain course | "Acknowledged, proceeding to next step" |
"oh" | 127 | Tap the brakes | "Something's wrong, re-evaluate" |
"actually" | 89 | Change lanes | "Switching approach" |
"wait" | 76 | Pull over | "Stop everything, recalculating" |
"probably" | 61 | Gentle steer right | "70-80% confidence this direction" |
"definitely" | 43 | Floor it | "100% confidence, full speed ahead" |
"let's go!" / "🔥" | 31 | Hit the turbo | "Maximum engagement mode" |
These aren't in any prompt engineering guide. They emerge naturally from knowing when the model is about to veer off-course. But here's a thing: they only work if your intuition about the direction is correct.
The Mystery of the Happy Neurons (Needs More Research)
Here's something weird I've noticed but can't fully explain: I keep Claude PUMPED. Like 🎉🚀🔥 pumped. When I'm excited about a solution, Claude seems to match that energy and perform better.
Is there a special subset of neurons that fires when the model sees 🎉 or "let's gooooo!"? No direct scientific evidence, but anecdotally:
- Enthusiastic prompts seem to get more creative solutions
- "This is gonna be awesome!" gets different results than "Try to fix this"
- Energy appears to be contagious—even to an AI
I'll leave it to the researchers at Anthropic to figure out if there's actually an enthusiasm-activation pathway in the model. All I know is: keeping the vibe high seems to keep the results high. Your mileage may vary, but... why not try it?
The Positive Feedback Loop
Two developers approach the same bug:
Developer familiar with codebase:
- Right intuition ("It's probably in the middleware")
- Confident signal → AI goes straight to middleware
- Finds bug quickly
- Reinforcement: Next time, even stronger signals
- AI learns to trust their intuition more
Developer new to codebase:
- Wrong intuition ("Maybe it's in the UI?")
- Uncertain signal → AI searches everywhere
- Eventually finds bug in middleware after a dozen prompts over an hour
- Context window filled with red herrings, muddying the context for the agent
- Engineer uncertain, confused, lower confidence leading to looser steering from that point onwards
The gap exists, but it's about codebase familiarity, not years of experience. A junior who's been deep in a specific codebase for 6 months might outperform a senior who just joined.
Why Context Engineering Falls Short
Don't get me wrong—context engineering is crucial. It's the map. But:
- Context is static information. Real coding is dynamic problem-solving.
- You can't pre-document every micro-decision. "If you see a TypeError on line 34, it's probably the Redux selector, not the component prop."
- Perfect context + wrong intuition = confidently driving off a cliff.
- Context engineering is generated how exactly? Prompting, yes, prompting. So the lack of micro nudges upstream compound even more in those early steps
The best context can't compensate for steering in the wrong direction.
Sitting at the Local Maximum
Here's a confession that might explain at least part of my success rate: I might be working in the EXACT sweet spot where AI coding agents perform best.
Factor 1: Training Data Jackpot
Apache Superset isn't just documented—it's been fully digested by frontier models. Strip away all context tools and CLAUDE.md files, and both Claude and GPT still know EVERYTHING about Superset:
- Every API endpoint and its quirks
- Common bug patterns and their fixes
- The entire architecture and design decisions
- Even specific Superset Improvement Proposals (SIPs), PRs and discussions from GitHub from years back
Training-time knowledge (true understanding) beats context-window knowledge (short-term memory) every single time. When I say "fix the Chart component," the model already has deep, intuitive knowledge about that component—not from my context, but from training.
Factor 2: The Stack Overflow Effect
Superset uses the most documented, discussed, Stack-Overflow'd tech stack imaginable:
- React (millions of examples)
- Python/Flask (decades of patterns)
- SQLAlchemy (every edge case documented)
- TypeScript (the new standard)
- Redux (battle-tested patterns)
Every library we use has thousands of tutorials, millions of code examples, and endless GitHub discussions in the training data. The AI has seen every possible permutation of our stack.
The Proprietary Disadvantage
This might be the biggest factor nobody talks about: **Proprietary codebases will never get this advantage.** Your closed-source enterprise app with custom frameworks and internal libraries? The AI has never seen it. It's flying blind, relying entirely on your context window.
Open source projects, especially popular ones, get the full power of training-time knowledge. Proprietary projects get a tourist with a phrase book.
I've had great success outside Superset, but mostly on smaller projects or using common patterns. In large proprietary repos, you're missing this massive training data advantage—and no amount of RAG or context engineering can fully bridge that gap.
The Gift That Keeps Giving
Here's the beautiful part about open source: Every new model generation knows our codebase better. As models grow more sophisticated and training runs expand, they develop deeper understanding of Superset. Network effects compound: more contributors mean more discussions, more Stack Overflow answers, more blog posts, all feeding future training runs.
Maybe my personal intuition matters less with each model release? Maybe AI is gradually becoming the true BDFL (the maintainer who never sleeps, never forgets, and knows every line of code ever written). I'm honestly happy to share that burden. The future might not be "human BDFL + AI assistant" but rather "AI BDFL + human creativity director."
Open source isn't just free as in beer or free as in speech, it's free as in "freely incorporated into the collective intelligence of every future AI system."
The Speed Zone Paradox
Here's a mind-bender: Imagine you have a car that can go near-infinite speed, but your route has random 10 MPH school zones. You'd spend close to 100% of your time crawling at 10 MPH, despite your supercar.
That's AI-assisted coding in a nutshell. AI accelerates you to warp speed until you hit:
- The impossible refactor nobody's attempted before
- That bizarre bug specific to your setup
- Integration with your proprietary internal API
- The creative decision only humans can make
The psychology is brutal: You feel perpetually stuck, even though you're actually flying between the slow zones. It's like only remembering traffic jams, not the open highway.
But here's where it gets interesting:
- Some developers are 2-3x faster in those slow zones (familiarity with quirks and/or expertise)
- Others are creative at finding detours around school zones or avoiding speed bumps
- The best do both: They navigate slow zones efficiently AND architect around them
Your effective speed isn't your AI-boosted maximum—it's how well you handle the bottlenecks. A developer who's slightly better at the "impossible" parts might be 10x faster overall, even with the same AI tools.
The Uncomfortable Truth
Context engineering IS amazing—it deserves the attention it's getting. The evolution from basic prompting to sophisticated systems that provide agents with curated context and the tools to find what they need represents real progress. But while the thought leaders have been laser-focused on better context delivery, they've missed something fundamental.
The industry is pouring resources into:
- Larger context windows
- Better RAG systems
- More sophisticated embeddings
- Detailed documentation
- Semantic search layers
- Vector databases
These are all valuable! But they're optimizing the GPS system while ignoring the steering wheel.
The overlooked leverage is in intuition + micro-corrections + bottleneck navigation. My prompts can be absolutely sloppy—typos, bad grammar, incomplete thoughts—but if they point near the bullseye with confidence, it doesn't matter how they're constructed. The subtext carries more weight than the syntax.
You've all experienced it: The model spinning in circles, burning tokens, flip-flopping between bad approaches. Then you drop one short hint—"wait, isn't that the tsc
issue with tsconfig
being ignored when passing multiple files?"—and suddenly it's back on track. You know, that ONE time it said "You're absolutely right!" and actually meant it? That's your intuition finally giving it the micro-correction it needed to escape the loop.
Here's the brutal reality: The best prompt engineer with perfect context but wrong intuition will lose to sloppy prompts with correct hunches. Every time.
What This Means
For Developers
Your codebase familiarity matters more than your prompting skills. AI is an expertise amplifier, not an expertise replacement. But here's the exciting part: You can build that expertise much faster than ever before.
For Companies
That dream of replacing experienced developers with beginners + AI? Not happening. But the dream of growing junior developers into experts at unprecedented speed? That's real. AI makes your experienced developers by an unknown factor more productive. It makes your newcomers learn much faster too.
For AI Tools
We need better confidence calibration per user. A developer's "maybe" after 6 months in a codebase should be weighted differently than their "maybe" on day one. Maybe a ~/.claude/[USER.md](http://USER.md)
, anyone?
The Road Ahead
AI isn't replacing expertise—it's becoming the ultimate expertise amplifier. The developers who win won't be the best prompt engineers. They'll be the ones whose intuition can subtly steer these incredibly suggestible systems.
But here's the hope: Your intuition might not be right yet, but you can learn faster than ever before. Every AI interaction is a learning opportunity. When the AI finds the bug in the middleware after you guessed UI, you just learned something. After 100 such corrections, you've compressed years of learning into months.
Think about it this way: We're not heading toward a future where everyone can code. We're heading toward a future where those who can learn to code get superhuman leverage—and can build 10 years worth of codebase context at 10x speed.
Your map can be perfect, but if you can't steer, you're still ending up in the ditch. The good news? AI is teaching you to steer faster than any human mentor ever could.
Based on analysis of 5,830 real prompts from 6 months of AI pair-programming on Apache Superset. The author has been the lead maintainer of Superset for 10 years.
Want to Analyze Your Own Style?
If you've been working with Claude Code and want to understand your prompting patterns—whether you're cruising or bumping into guardrails—you can analyze your own style with this slash-command:
Try the same debugging session with different confidence levels. Say "this is definitely wrong" instead of "this might be wrong" and watch how differently the AI responds. The results might surprise you.
Next: The Context Density Revolution
There's an entire other dimension I haven't touched: context density.
In many cases, my micro-nudges aren't just steering corrections—they're incredibly information-dense and timely. "Are you aware of the custom import subsystem?" or "Isn't that the function Vlad monkey-patched years ago?" These hints unlock massive context trees in the model's understanding.
The most striking example: pre-commit run
. Four tokens. In Superset's context, this carries hundreds (thousands?) of lines worth of information—which hooks run, what they check, how they're configured, what errors to expect. The model has everything it needs to dig deep, understand the situation, and make a plan on the fly.
Models are getting better at needle-in-haystack retrieval, but there's no amount of context engineering that will capture everything. Even if you try, the signal gets drowned in over-engineered noise. The solution isn't more context—it's denser context, delivered/discovered at the right time.
Context density might be the second most underrated aspect of working with AI (after the micro-steering we've discussed). It's not about how much context you provide, but how much information per token you can pack when the timing is right.
Saving that for another blog post!