Engineering

Refactoring in the Age of LLMs: Architecture, Not Cosmetics

Why modern refactoring isn’t cleanup work anymore but the core mechanism that keeps architecture alive in a world of machine-generated code

15 min read Max
#refactoring #architecture #llm #typescript #software-design #tech-debt #ai-slop

This piece was influenced by Martin Fowler’s 2014 talk “Workflows of Refactoring,” which fundamentally shaped how I think about continuous design in codebases.


Software engineering has always had a complicated relationship with refactoring. Most teams treat it like dental work—something you know you should do, feel vaguely guilty about postponing, and will definitely get around to next quarter. There’s always a feature with a deadline breathing down your neck, always something more urgent than cleaning up that god-object everyone tiptoes around.

But refactoring isn’t cleanup. It’s not technical debt repayment. It’s not something you schedule between sprints when the velocity gods smile upon you.

Refactoring is design. It’s what happens when code meets new knowledge. It’s the pressure that keeps architecture aligned with reality. Without it, a codebase doesn’t just accumulate mess—it fossilizes. Every new feature gets built on top of obsolete assumptions, and the gap between what the system is and what it should be grows wider until you’re essentially maintaining a museum of bad decisions.

The funny part is that developers understand this instinctively. They know that every time they touch a piece of code, they learn something new about the domain. They notice an abstraction that was slightly naive, a boundary that’s now wrong, a function doing two jobs instead of one, a service that grew until it became Jupiter with seventeen moons orbiting it. They feel the friction. They see the mess. They just don’t know what to do about it within the constraints of “ship the feature by Friday.”

The question isn’t whether refactoring is necessary—it’s unavoidable. The real question is what kind of refactoring mindset engineers carry with them when they touch code. Because the difference between systems that age gracefully and systems that become unmaintainable nightmares isn’t the amount of refactoring done. It’s whether the refactoring reflects deeper architectural thinking, or whether it’s just cosmetic rearrangement that makes the git diff look pretty while leaving the structural problems intact.

And in the age of LLM-assisted coding, this distinction became existential.


The Living Document Problem

Every codebase is a historical artifact. Each line captures the domain knowledge available at the moment it was written. When someone implemented that feature six months ago, they understood maybe twenty percent of what the team knows now. They hadn’t built the related modules yet. They hadn’t seen how the data actually behaved in production. They hadn’t discovered the weird edge cases that would eventually force a complete rethink of the assumptions.

By the time you read old code, you are already more informed than whoever wrote it—even if that person was you. You’ve seen more. You’ve learned more. The domain model in your head is richer and more nuanced than the one encoded in the code.

That means the code is almost always slightly wrong for the present moment. Not catastrophically broken. Just subtly misaligned. The abstractions are a few degrees off-axis. The module boundaries reflect an older mental model. The naming uses concepts that have since evolved or been replaced entirely.

This is why the “don’t touch working code” mentality is poison. Working code isn’t static truth. It’s a snapshot of partial understanding. If you don’t reshape it as you learn, the codebase becomes a graveyard of outdated ideas, and every new feature you build on top of those fossils inherits their problems.

The best engineers understand that code needs continuous re-shaping. Not because they’re perfectionists or because they can’t leave well enough alone, but because the codebase should reflect current understanding, not historical guesswork.


What Refactoring Actually Means

A lot of developers approach refactoring like interior decorating. Move the couch. Repaint the walls. Hang some art. Rename variables for clarity. Split a 400-line file into three 150-line files. Extract a helper function. Flatten nested conditionals. Delete commented-out code from 2019.

All of this is fine. None of it matters if the architecture is rotten.

Real refactoring is architectural. It’s structural. It asks uncomfortable questions that don’t have quick answers.

Why does this service know about seven different domain concepts when it should only handle one? Why are business rules scattered across three layers? Why does this module import from eight different places, creating a dependency web that looks like a ball of yarn after a cat got to it? Why is it impossible to add this new feature without changing twelve files? Why do I have to mock half the system just to test one function?

These aren’t “clean up the code” questions. These are “what is this system supposed to mean?” questions. They force you to model the domain again with the knowledge you have now, not the knowledge you had when the file was created. They force you to confront the fact that your mental model evolved but the code didn’t.

This kind of refactoring doesn’t produce satisfying pull requests. It produces thought. It leads to careful, deliberate restructuring where you bend the system so its shape matches what you understand today. You’re not making it prettier. You’re making it true.

The difference between cosmetic and architectural refactoring is the difference between rearranging deck chairs and rebuilding the deck because you realized it was constructed over a sinkhole.


How LLMs Amplify Everything

Before LLMs, bad architecture punished the humans working on it. Slow velocity. Constant bugs. Developer frustration. High turnover. The usual.

But here’s what’s different now, and it’s more fundamental than most people realize: bad architecture used to create pain, and that pain was actually useful. When you tried to add a feature and had to thread parameters through ten functions, or prop-drill through seven components, or modify twelve files for one behavior change, you felt it. That friction was a signal. It told you something was wrong. And if you were paying attention, it motivated you to fix the structure.

Good architecture, in turn, rewarded you immediately and tangibly. When you took the time to refactor properly, to actually think about where responsibilities belonged, you often ended up writing less code. Not because you were being clever, but because you’d structured things so the new feature just… fit. You’d make a small change in the right place, and everything would fall into line. The system would do what you wanted without you having to tell it explicitly in ten different locations.

This reward loop—painful bad architecture, satisfying good architecture—was the engine that drove developers to care about design. You learned through direct feedback that thinking about structure paid off in concrete ways.

LLMs broke that loop.

Now when you need a feature and the architecture is wrong for it, you don’t feel the pain anymore. You ask the LLM to implement it. You come back when it’s done. You check if it works. It usually does. Ship it.

What you don’t see is that the LLM just generated 300 lines of code threading that parameter through all ten functions because that’s what the existing pattern suggested. It prop-drilled through all seven components because that’s what similar code in the codebase did. It modified those twelve files because that’s what the architecture required.

The feature works. The pain is hidden. The reward for good architecture disappears too, because the LLM will generate 300 lines just as easily as it generates 30. You don’t feel the difference between fighting the architecture and working with it, because you’re not the one doing the work anymore.

Do this enough times and you end up with a codebase that’s drowning in generated code that doesn’t fit any coherent architecture. The structure becomes unfixable because there’s too much code expressing the wrong abstractions. Refactoring stops being “reshape this” and becomes “rewrite everything,” which never happens.

Pattern Replication at Machine Speed

The LLM isn’t just a pattern replicator anymore. It’s a pattern amplifier and a pain suppressor. It takes your architectural problems and scales them up while simultaneously removing the feedback mechanism that would have pushed you to fix them.

If your world is well-designed, this is magical. The LLM extends clean abstractions naturally. It respects boundaries without being told. It slots new features into obvious places because the architecture makes those places obvious. You point at a well-factored codebase and say “add a feature like this,” and the LLM produces something that looks like it was written by someone who deeply understands the system.

But if your architecture is a mess, the LLM becomes a xerox machine for your mistakes. It clones bad patterns enthusiastically. It extends god-objects into super-god-objects. It tangles responsibilities further. It adds indirection without purpose. It creates new problems that perfectly mirror your existing problems.

An LLM has no architectural intuition. It can’t look at a codebase and think “this grew wrong; I should push back.” It has no sense of why things are shaped the way they are. It just sees shapes and makes more of them.

This is what makes architectural refactoring critical now in a way it wasn’t before. You’re not just maintaining a codebase for human developers anymore. You’re maintaining a codebase that teaches machines how to extend it. If you let the architecture rot, every LLM-generated feature will accelerate that rot at machine speed.

In a world where AI can produce a hundred lines of code in seconds, architecture is the only meaningful defense against runaway complexity.


Why LLMs Can’t Actually Refactor

People love asking LLMs to refactor things. “Make this cleaner.” “Split this file into smaller modules.” “Improve the structure here.”

And the LLM will try. It will produce output. It will split files and extract functions and rename things. What it won’t do is actually refactor, because refactoring requires understanding why the code grew the way it did.

Give an LLM a 1200-line service and ask it to “refactor into smaller services,” and it will obediently chop it into pieces. Now you have five 300-line services glued together with more indirection. The line count is better. The architecture is worse. You didn’t improve modularity—you distributed the mess across more files.

Real refactoring asks different questions. Why did this service become 1200 lines? What domain concept is missing that would have prevented this growth? Which responsibilities were never properly separated? What abstractions hadn’t been invented yet when this was written? What information does this code know that it shouldn’t?

These aren’t “make the code better” questions. These are “what should the system mean?” questions. They require understanding the domain, understanding how that domain has evolved, understanding what problems developers are actually trying to solve when they wade into this swamp.

LLMs can’t answer those questions. Only developers can. And only developers who are actively thinking about architecture while they work, not just during designated “refactoring sprints” that never actually happen.


The Refactoring Ticket Delusion

Speaking of refactoring sprints: they don’t work. Refactoring tickets don’t work. They never have, and they’re especially doomed now.

The reason is simple. Refactoring tickets compete directly with feature tickets, and features always win. Always. There is no product manager on Earth who will prioritize “clean up the user service” over “deliver the thing we promised the board.” Even engineers struggle to justify pure-refactoring work because it feels like maintenance rather than progress, technical debt rather than value creation.

So what happens? The refactoring ticket gets created with good intentions. It gets moved to the backlog. Then to next sprint. Then to next quarter. Eventually it becomes a zombie ticket that everyone knows will never get done but nobody wants to officially kill because that would mean admitting defeat.

This is why Martin Fowler’s concept of “preparatory refactoring” is so powerful. It sidesteps the entire failure mode.

You don’t ask for permission to refactor. You don’t schedule it. You don’t create tickets for it. You don’t beg for a sprint where the team can focus on “technical health” while the business taps its foot impatiently.

Instead, you refactor toward the shape you need while delivering actual features that require that shape. You do it as part of the work, not as separate work. When you’re building something new and you realize the current structure makes it painful, you restructure just enough to make the new thing fit cleanly. You create a boundary. You add an adapter. You build a facade over the mess so the new code doesn’t inherit its problems.

This is where patterns like adapters, facades, and anti-corruption layers become crucial. You’re not rewriting the world. You’re protecting new code from old architecture while gradually transforming what lies beneath. You keep the surface stable while slowly fixing the guts. You prevent the rot from spreading while chipping away at its source.

And then every time someone touches the old code for legitimate feature work, they improve it a little. They rewrite a small piece. They move it toward compatibility with the newer, better architecture. The system evolves continuously, gradually, safely, without ever halting feature delivery.

This is the only way large-scale architectural improvements ever actually happen in real codebases. Not through grand rewrites. Not through dedicated refactoring months. Through hundreds of small, deliberate improvements woven into the regular flow of work.


Architecture as Constant Background Thought

When experienced engineers talk about refactoring as a habit rather than a task, what they really mean is maintaining constant architectural awareness in everything they do.

This isn’t heavy intellectual work. It’s not sitting down with whiteboards and UML diagrams before every change. It’s a quiet background process that runs while you write even the smallest piece of code.

What domain concept am I actually working with here? Where does this logic really belong? What implicit assumption just became explicit? What old abstraction just cracked under the weight of this new use case? Why did I have to scroll through four hundred lines to understand what this function does? Why does this feature feel harder to implement than it should be?

These questions become reflexive after a while. You’re not stopping to formally analyze everything. You’re just noticing when something feels wrong, when code doesn’t fit naturally, when you’re fighting the architecture instead of working with it.

Developers with this mindset don’t write isolated lines of code. They place each line into the larger structure of the system. They feel when boundaries are in the wrong place. They adjust. They reshape. They nudge things toward coherence continuously.

It’s like gardening. You don’t let the garden grow wild for six months and then spend a weekend hacking everything back to size. You trim as you go. You prune dead branches the moment you see them. You adjust growth patterns early before they become structural problems. The garden is alive, and your job is to guide it, not to periodically bulldoze and replant.


How to Build the Architectural Mindset

If you want to develop this kind of thinking—the constant awareness, the reflexive questioning, the instinct for when structure is wrong—the answer isn’t reading more architecture books or memorizing patterns.

It’s code review. Serious code review.

Not the performative kind where you check for style violations and suggest replacing || with ??. The kind where you actually think about what’s happening. Where you ask whether this is the right approach. Where you consider if there’s a better place for this logic to live. Where you notice that three files are doing similar things and wonder if there’s a missing abstraction.

Treat code reviews as an exercise in becoming a better engineer. Both when reviewing others’ code and when receiving feedback on yours. Reading other people’s code, especially code solving problems in domains you understand, teaches you things that writing code alone never will. You see different approaches. You notice when something feels awkward. You start developing taste.

This is how the architectural mindset becomes automatic. Not through studying principles, though that helps. Through repeated practice looking at code and asking “is this where this should live?” until the question becomes as natural as breathing.

In the age of LLMs, this matters more than ever. The machines will generate whatever you ask for. Your job is to know what to ask for, and more importantly, to know when what you’re asking for is wrong. Code review is where you build that judgment.


What Good Architecture Actually Looks Like

Good architecture is invisible. Not because it doesn’t exist, but because it makes everything feel natural.

Features fall into place with surprising ease. The concepts match how people think about the domain. The boundaries are obvious. The abstractions are honest—they don’t lie about what they do or hide complexity that explodes later. Adding a new behavior feels like extending a pattern, not fighting a knot of dependencies.

Code in a well-architected system reads like prose. You can follow the flow. You can see where things belong. When you need to change something, it’s clear what needs to change and what can stay the same. The system has structure, and that structure helps you rather than constraining you.

Bad architecture, by contrast, is loud. Not in the code itself—the code might look fine at first glance—but in how developers behave around it.

People avoid touching certain files. They dread opening specific modules. They hesitate before making changes because they don’t understand the blast radius. They scroll endlessly trying to piece together the actual logic from fragments scattered across the codebase. They write defensive code because they don’t trust the system. They accept complexity as inevitable because “that’s just how things are here.”

The worst part is how this friction compounds with LLMs. If you point an LLM at a poorly structured codebase and ask it to add features, it will faithfully replicate that poor structure. It will multiply the mess. It will create new tangles that mirror the existing tangles. It will spread the architectural disease at machine speed because it has no way to know the structure is wrong.

Good architecture makes the codebase teach the LLM what to do. The patterns are clear enough that the machine can follow them. The boundaries are obvious enough that new code naturally respects them. The structure is coherent enough that extensions feel inevitable rather than forced.

Bad architecture makes the LLM a chaos engine. It generates plausible-looking code that deepens every existing problem while creating new ones that you won’t notice until three months later when someone tries to change that code and discovers it’s brittle, coupled, and impossible to test.

This is why “move fast and fix it later” is catastrophic now. Later never comes. And in the meantime, your AI assistant is churning out features built on broken foundations, each one making the next one harder.


The Only Thing That Matters

In the end, refactoring isn’t about code cleanliness. It’s not about following style guides or hitting metrics or satisfying linters. It’s not about DRY or SOLID or whatever acronym is fashionable this year.

Refactoring is how architecture evolves in response to new knowledge.

Every feature teaches you something about the domain. Every bug reveals a misunderstanding about reality. Every performance problem exposes an overlooked assumption. Every messy file points to a concept that lacks a proper home. Every painful change highlights a boundary that’s in the wrong place.

Refactoring is how you take all of that learning and feed it back into the system. It’s how you keep the codebase aligned with your current understanding instead of your historical guesses.

And in a world where LLMs can produce infinite amounts of code but zero amounts of judgment, refactoring is the only thing preventing your system from drifting into incoherence.

Your job isn’t to write code anymore—machines can do that. Your job is to think. To understand the domain deeply enough that you can shape the system to match it. To design structure that makes sense. To maintain the architectural truth of the codebase even as AI assistants are churning out thousands of tokens per minute trying to guess what you want.

If you keep the architecture healthy, LLMs become powerful multipliers. They extend your design correctly because the design is clear enough to extend. They work with you instead of against you.

If you let architecture rot, LLMs become accelerants. They amplify every mistake. They turn small messes into large messes faster than you can debug them. They drown you in plausible-looking garbage that passes tests and looks fine in code review but slowly strangles the system’s ability to evolve.

Refactoring isn’t optional anymore. It’s not a nice-to-have or a “when we have time” activity. It’s the only defense against a future where machines generate more bad code than humans can fix.

Architecture is all that matters now. Everything else is just implementation details.