Engineering

Vibe Coding

How to Actually Do This Without Shooting Yourself in the Foot

15 min read Max
#vibe-coding #ai #software-development #coding #llm #cursor #architecture #engineering #code-review #developer-tools

This article is part of a series


The Machinery Behind the Magic

3/3

Everyone's talking about AI, but most conversations oscillate between breathless hype and dismissive skepticism. This series takes a different approach: treat these tools as what they are—sophisticated machinery that rewards understanding. Like any powerful tool, they have strengths, limitations, quirks, and best practices. You don't need to be an engineer to benefit from knowing how your tools work. These articles will take you from "what even is this?" to "I know exactly when and how to use this.

Andrej Karpathy called it “vibe coding”—the practice of describing what you want in natural language, letting AI generate the code, and iterating through conversation rather than typing syntax. The term stuck, partly because it captures something real about how the workflow feels, and partly because it annoys people who think coding should be Serious and Rigorous.

I vibe code everything now. I haven’t written a for-loop by hand in months. I code in English, review in diffs, and iterate at a speed that would have seemed absurd two years ago.

But here’s the thing: vibe coding done poorly is worse than not using AI at all. The failure mode isn’t “AI writes bad code.” The failure mode is “engineers ship code they don’t understand.” And that’s a disaster that doesn’t surface until 3am when production is on fire and nobody knows why.

This article is about how to do it right.


What Vibe Coding Actually Is

The shift isn’t from “writing code” to “not writing code.” It’s from “I implement solutions” to “I specify intent, review output, and iterate until it’s right.”

You’re still the engineer. You still need to understand the code. You still need to know what good architecture looks like, what edge cases exist, what will break under load. The AI is executing your vision faster—it’s not having the vision for you.

Think about how I’m writing this article. If you could see my conversation with the AI, you’d find every single talking point, every idea, every opinion—in my prompts. The AI isn’t coming up with the insights. I am. What the AI does is turn my messy, rambling notes into coherent prose. Instead of spending days thinking about phrasing, I get to focus on what I want to say and let the AI handle how to say it.

It’s autocomplete on steroids. Not autocomplete for words—autocomplete for intent.

Vibe coding works the same way. You bring the intent, the architecture, the judgment. The AI brings speed. If someone reviewed your prompts and your output, they should find the same ideas in both—just expressed more cleanly in the output.

The principle: Use AI to enhance your work, not to do your work.

The engineers I see struggling with vibe coding fall into two camps. The first camp doesn’t give enough context—they type vague requests, get vague code, and conclude AI isn’t ready. The second camp gives up too much control—they let the AI generate mountains of code, glance at it, ship it, and then can’t debug it when things go wrong.

The sweet spot is aggressive collaboration. Precise instructions. Constant review. Willingness to regenerate and rewrite until it’s actually right.


The Tool Landscape (Late 2025)

This section will date itself. Tools evolve fast. But the categories and principles should outlast the specific products.

There are three tiers: IDE-integrated tools, CLI agents, and agentic platforms. My strong opinion is that IDE-integrated tools are where serious work happens—and the others, while interesting, aren’t ready for production code.

IDE tools (Cursor, GitHub Copilot, Augment Code, Windsurf) put AI in your editor. You see every change, review every diff, stay in the loop. The AI is a collaborator sitting next to you. Try several and find the flow that clicks—they’re more similar than different. What matters is that you’re in an environment where you can intervene constantly.

CLI agents (Claude Code, Codex CLI, Gemini CLI) run from your terminal. They can read your codebase, make edits, run commands. Powerful for codebase-wide refactoring or complex multi-step operations. But when an agent is autonomously editing files while you’re getting coffee, you’re not in the loop. That gap is where bugs creep in.

Agentic platforms (Google’s Antigravity, OpenAI Codex cloud) go further—they autonomously plan, code, test, and validate. You describe what you want, they go build it, you review the PR.

For prototyping? Interesting. For production code you’ll debug at 3am? Not there yet.

The fundamental issue: if you didn’t watch the code being written, if you didn’t make the architectural decisions along the way, if you don’t deeply understand what got generated—you’re not the engineer anymore. You’re a manager approving PRs from a contractor you can’t talk to.


Models: Different Tools for Different Jobs

This will also date itself, but the principle won’t: no single model is best at everything. Serious vibe coders switch between models based on the task.

For Raw Editing Speed: Composer-1 (Cursor’s Model)

This is my workhorse. It’s not the smartest model. Sometimes it infuriates me. But the speed is unmatched, and it has a beautiful quality: it shuts up and executes.

Where other models apologize, hedge, offer opinions, ask clarifying questions—Composer-1 just does the thing. For rapid iteration, for “change this function to do X instead,” for grinding through implementation when you know exactly what you want? Nothing else comes close.

For Thinking and Planning: Claude Opus 4.5, GPT-5.1

When I need to reason about architecture, analyze a complex codebase, plan a migration, or review code thoughtfully—these are the models I reach for.

They’re slower. They’re more expensive. For editing, that cost adds up fast and the speed penalty is painful. But for “think deeply about this problem,” they’re worth it.

I’ll often plan with Opus or GPT-5.1, then switch to Composer for implementation.

For One-Shots: Gemini 3 Pro

I have never met a model that can one-shot problems like Gemini 3. It’s almost unsettling. You describe what you want, it produces a complete working implementation, and it’s… correct. First try.

But—and this is a big but—if you try to iterate with it, give it follow-up instructions, ask it to make a plan and then execute that plan? It has a mind of its own. It goes off-rails. It ignores constraints. It decides it knows better.

For pure vibe coding—“I have an idea, build me a prototype, I don’t care how”—Gemini 3 is incredible. For production work where you need the model to follow your architecture and your rules? Nightmare.

The Meta-Advice

Learn which model to reach for. Fast editing: Composer. Deep thinking: Opus/GPT-5.1. One-shot prototyping: Gemini 3.

This is a skill in itself—and it’s one of the things that separates effective vibe coders from people who just use whatever default their tool gives them.


Staying in Control

The tools make it easy to generate code. Dangerously easy. The hard part isn’t generating—it’s maintaining quality when generation is cheap.

Version control is non-negotiable. Commit early, commit often. You need to be able to roll back, see exactly what changed, return to a clean state when the AI generates something broken.

But the real discipline is reviewing with the right lens.

“It compiles and the tests pass” is not the bar. Most AI-generated code will compile. Most will pass your existing tests. That tells you almost nothing about whether the code is right.

The actual bar: “I understand this code, and it fits the architecture.”

This is where I see engineers get into trouble. The AI generates something, it seems to work, they ship it. Three weeks later there’s a weird bug and they have no idea where to look—because they never actually understood what got generated. They couldn’t debug it because they didn’t write it, and they didn’t review it closely enough to have written it.

You wouldn’t merge a PR from a coworker if you didn’t understand the change. The AI doesn’t get a pass just because it types fast.

Watch for the fingerprints of unreviewed AI code: random packages in your dependencies, migrations that don’t make sense, configuration changes you didn’t ask for, comments referencing things that don’t exist. When I see these in our codebase, I know exactly what happened—someone let the AI write something, glanced at it, and committed.

One underutilized technique: use AI to review AI. After generation, ask a different model (or fresh context) to critique the output. “What bugs do you see? What edge cases am I missing? Does this fit with the patterns in this codebase?” AI is often better at critique than generation. Use that.


The Workflow in Practice

The actual day-to-day isn’t one tool—it’s switching between modes based on what the task needs.

Planning happens in chat. Before I write any code, I’m talking through the approach with a thinking model. “Here’s what I’m trying to build, here’s our current architecture, what are my options?” I’m using AI to pressure-test ideas, catch things I haven’t considered, think through edge cases. No code generation yet—just structured thinking out loud.

Implementation happens in the IDE. Once I have an approach, I switch to Cursor. Now the prompts are specific: “Add a handler in this file that follows the pattern we use for the other handlers. Use existing auth middleware. Events should be typed—here are the types.” Composer rips through it. I’m reviewing every file as it changes. When something’s wrong, I don’t add corrections—I edit my prompt, regenerate clean.

Debugging goes back to chat. Paste the error, paste the relevant code, ask the thinking model to reason through it. This isn’t vibe coding—it’s using AI as a rubber duck that talks back.

The through-line: I know what I want at every stage. The AI accelerates execution, but the architectural decisions are mine. By end of day, I’ve read every line of the diff. I understand every change. I could explain any part to a colleague.

The AI helped me get here 5x faster. But the code is mine.


The Real Skill: Architecture

Here’s what most “learn to vibe code” advice gets wrong: they tell you to review harder, specify better, prompt more precisely. That’s not wrong, but it misses the point.

The real skill is architecture. It always was. AI just makes this brutally clear.

LLMs Replicate What They See

Remember from the first article in this series: the model generates based on what’s in the context window. When you’re vibe coding in an IDE, the context isn’t just your prompt—it’s your codebase. The model reads your existing files, sees your patterns, and generates more of the same.

If your architecture is clean, the AI generates clean code that fits.

If your architecture is a mess, the AI generates more mess. Faster.

No LLM will stop and say “actually, this pattern you’re using is garbage, let’s refactor first.” That’s not how they work. They’re pattern-completion machines. They complete your patterns, whatever those patterns are.

Your codebase is part of the prompt. If the codebase is shit, no prompt engineering will save you.

The “More vs. Better” Trap

Generation is cheap now. Solving a ticket is cheap. You can finish three features in a day by letting AI take the wheel.

This is incredibly seductive. Product is happy. Your git stats are stellar. You’re suddenly that mythical 10x engineer everyone talked about a decade ago.

But here’s the thing about those 10x engineers: they were never 10x because they typed faster. They were 10x because they were masters of architecture.

Yes, they could hammer out algorithms quickly. But algorithms are a small part of modern software. Most problems are already solved—there’s a Stack Overflow answer, an npm package, a well-known pattern. Even before AI, raw coding speed wasn’t the bottleneck.

What made 10x engineers fast was their ability to design systems where features slot in cleanly. Where adding the next thing doesn’t require fighting the last thing. Where the architecture does half the work before you write a line of code.

When product came up with a crazy new requirement, the 10x engineer knew exactly where it went. They constantly refactored to keep the codebase agile. Little tech debt. Room for change without overengineering.

That’s the skill that matters now. More than ever.

Refactoring Is Cheap Now Too

Here’s the flip side, and it’s actually good news: if you have strong architectural judgment, AI is an incredible force multiplier.

Refactoring used to be expensive. You knew the codebase needed restructuring, but the implementation cost was high, the risk was real, and there was always a feature to ship instead. So tech debt accumulated.

Now? Refactoring is cheap. You can describe the transformation you want, let the AI do the grunt work, review the output. What used to take a week takes an afternoon.

The engineers who understand this will use AI to continuously improve their codebase. They’ll refactor aggressively. They’ll keep architecture clean because the cost of cleaning is finally low enough to actually do it.

The engineers who don’t will use AI to pile more features onto a rotting foundation, faster than ever.

Same tools. Opposite outcomes. The difference is architectural thinking.

Code Review Is Architecture Review

Most code reviews I see are syntax reviews. “Use nullish coalescing instead of logical OR.” “You could destructure this.” “Consider extracting this to a helper function.”

These aren’t wrong. But if that’s the extent of your code review, you’re missing the point—and you’re especially missing the point in the age of AI.

When you’re reviewing AI-generated code, syntax is the least of your concerns. The AI probably got the syntax right. What you need to evaluate is:

Does this fit the architecture? Or does it introduce a new pattern that doesn’t match existing code? Is this the right abstraction? Or is it a quick hack that will calcify into tech debt? Does this belong here? Or should this feature live somewhere else? What will this look like when we need to extend it? Will it be easy to modify, or will we be fighting this code in six months?

This is architecture review. It requires understanding not just the code in front of you, but the system it lives in, the direction the system is going, and what good looks like at scale.

If you can’t do this, you can’t vibe code safely. You’ll generate code, it’ll seem to work, and you’ll slowly build a codebase that nobody can maintain.


The Skill Shift

Let’s be direct about what’s happening.

Implementation is becoming a solved problem. Not fully solved—there’s still judgment in how you implement. But the gap between “I know what to build” and “it’s built” is collapsing. AI handles implementation.

What AI doesn’t handle—what it can’t handle, by the nature of how these systems work—is deciding what to build and where to put it. The architectural judgment. The system thinking. The taste.

The job is shifting from Software Engineer to Software Architect. Not in title—in substance. The engineers who thrive will be the ones who can hold a complete mental model of their system, make structural decisions that the AI then executes, recognize when existing architecture needs to change before adding features, and continuously reshape their codebase as requirements evolve.

This isn’t a higher bar. It’s a different bar. Some engineers who were mid-level implementers will struggle because implementation was their strength. Some engineers who were “slow coders” but strong system thinkers will suddenly become incredibly productive.

The playing field is reshuffling around architectural skill.


The Opportunity

Here’s what I want you to take away from this:

We’re living through a genuine shift in what it means to write software. The implementation bottleneck—the thing that’s constrained software development for decades—is dissolving. What used to take weeks takes days. What used to take days takes hours.

This is not a threat. It’s a massive opportunity.

If you’ve ever had more ideas than time to implement them—you now have time. If you’ve ever known exactly how the code should be structured but dreaded the grunt work of actually writing it—the grunt work is handled. If you’ve ever wanted to refactor that ugly module but couldn’t justify the cost—the cost just dropped by an order of magnitude.

The engineers who lean into this will build better software, faster, than has ever been possible. They’ll spend their mental energy on what matters—architecture, design, understanding the problem—and let AI handle the mechanical translation into code.

The tools are here. The leverage is real. The question is whether you’ll use it to do more of the same, or to do genuinely better work.

Vibe code everything. Think harder about architecture. Build something great.


This is the third article in the “Machinery Behind the Magic” series. Start with “What’s Actually Inside the Box” for how LLMs actually work, or “The Prompt Is the Product” for prompting techniques that apply beyond coding.