The Secret Life of Claude Code: The Multi-Model World

Why models are not interchangeable (and how to build a multi-model review pass)

#ClaudeCode #SoftwareEngineering #CodingWithAI

Margaret is a senior software engineer. Timothy is her junior colleague. They work in a grand Victorian library in London — the kind of place where code quality is the unspoken objective, and craftsmanship is the only thing that matters.

Episode 18

The deep winter had brought with it a particular quality of silence — the kind that only comes when snow has fallen on the city and muffled everything to a hush. The library felt, on this evening, less like a building and more like an idea of warmth, the windows opaque with frost, the shelves holding their long patience, the fire speaking quietly to itself in the grate.

Timothy arrived shaking snow from his coat, his face bright with cold, carrying not his usual notebook but a leather satchel that clinked softly when he set it down.

Margaret looked at the satchel.

"You've brought something," she said.

"Three things," he said, sitting down and opening the satchel with the air of a man who had been waiting to have this conversation. He set three slim volumes on the table, each a different color, each clearly well-used. "I've been experimenting."

"Tell me."

"For the past month I have been using three different AI models on the same work. Not switching between them randomly — deliberately. Asking each one different things. Watching what comes back." He looked at the volumes. "The results have been instructive."

Margaret looked at the three books with quiet attention. "What did you find?"

"That they are not the same," he said. "Not interchangeable. Each one has a character. A set of things it does with particular strength, and things it does less well. And that using them as though they were the same — as though any model would do for any task — was leaving a great deal on the table."

"Yes," Margaret said. "That is precisely what most developers are currently doing."

Models Are Not Interchangeable

"Tell me about the character differences," Margaret said. "As you experienced them. Not as specifications — as working realities."

Timothy considered this carefully. "The first difference I noticed was in how they handle uncertainty. Some models will tell you when they don't know something. Others will tell you something with equal confidence whether they know it well or barely at all. The confidence of the prose is not a reliable guide to the reliability of the content — we have discussed this — but it varies between models. Some are more forthcoming about the edges of their knowledge than others."

"And that matters for which tasks you assign them," Margaret said.

"Considerably. If I am using a model to verify a technical claim, I want one that will tell me when it is uncertain. If it presents everything with equal confidence, I have to supply the skepticism entirely myself — which is exhausting and error-prone." He paused. "The second difference was in how they reason through complex problems. Some models think linearly — they move from premise to conclusion in a straight line. Others seem to hold more of the problem in view at once, to consider it from multiple angles before settling." He looked at her. "For architectural decisions — for the kind of thinking that requires holding many constraints simultaneously — the second kind is more useful. For implementation tasks where the path is clear, the first kind is often faster and cleaner."

"And the third difference?" Margaret said.

"How they handle disagreement," Timothy said. "When I push back on a conclusion — when I tell a model I think it's wrong — some will reconsider genuinely, weighing my objection against their reasoning. Others will simply agree with me. Which is pleasant but useless." He paused. "I want a model that will hold its ground when it is right, and change its position when I have given it a good reason. That is the behavior of a useful colleague. The model that agrees with everything I say is not a thinking partner — it is a mirror."

Margaret looked at him with quiet approval. "You have been paying attention."

The Right Tool for the Right Task

"How do you decide which model to use for which task?" Margaret asked.

"I have been developing a rough framework," Timothy said. "It is not precise — more a set of questions I ask before I begin." He opened one of the slim volumes. "First: how much does correctness matter, and how verifiable is the output? For tasks where I can verify the output directly — code that either runs or doesn't, logic that either holds or doesn't — the model's confidence calibration matters less, because I will check the work regardless. For tasks where I cannot easily verify — research, analysis, assessment — I want the model most likely to tell me when it doesn't know."

"Verification changes the risk profile," Margaret said.

"Yes. Second question: how much context does the task require? Some models handle large, complex contexts more gracefully than others — they hold the thread through a long conversation, they don't lose earlier constraints when the discussion has moved on. For tasks that require sustained attention to a large system, that capacity matters. For a small, bounded task, it is irrelevant."

"And the third question?" Margaret said.

"How much do I want to be challenged?" He looked up. "Some tasks benefit from a model that will push back, propose alternatives, argue for a different approach. The kind of thinking partner that makes me defend my decisions and sometimes changes my mind. Other tasks — implementation tasks where the design is settled and I simply need it executed — benefit from a model that will execute cleanly without relitigating the design." He paused. "Using a highly argumentative model for clean implementation is like hiring a philosopher to lay bricks. The conversation will be interesting. The wall will take longer."

Margaret smiled at this — a rare, complete smile. "That is a useful distinction," she said.

The Model You Know Best

"There is something else," Timothy said. "Something I did not expect."

Margaret waited.

"The model I have used longest — the one I have worked with most consistently — I understand better than the others. Not its architecture. Its tendencies. I know the kinds of prompts that produce its best work. I know where it is likely to guess rather than know. I know how it will misunderstand a poorly framed question, and how to frame questions so it doesn't." He paused. "That knowledge took time to acquire. It is genuinely valuable. And it does not transfer."

"Each model requires its own apprenticeship," Margaret said.

"Yes. Which is not an argument against using multiple models — the gains are real. But it is an argument against switching casually, or treating models as fully fungible, or chasing the newest release without understanding what you are giving up." He looked at the three volumes. "I know one of these well. The others I am still learning. And the tasks I assign to the ones I am still learning — I am more careful with those. More verification. More skepticism. Less reliance on my instincts about where the edges are."

"Because your instincts were built on a different model," Margaret said.

"And may not apply," Timothy said. "Yes."

The Multi-Model Workflow

"How does it work in practice?" Margaret asked. "The workflow. Not the theory."

"I have settled into something like this," Timothy said. "For the primary work — the building, the implementation, the extended conversations about architecture and design — I use the model I know best. The one whose tendencies I understand, whose outputs I can read accurately, whose confidence I have learned to calibrate." He paused. "For review — for the SME pass, for the second opinion on a technical claim, for the check on my own reasoning — I use a different model. Deliberately different. Because a different model brings different assumptions, different training, different blind spots."

"The review is more valuable when it comes from a genuinely different perspective," Margaret said.

"Yes. If I review my own work with the same model that produced it, the review inherits the original model's tendencies. The same things it glossed over in generation, it may gloss over in review." He paused. "A second model — one that does not share those tendencies — will find different things. Not always more things. But different ones. And the combination is more complete than either alone."

"And the third model?" Margaret said, glancing at the third volume.

"For the tasks where I want genuine argument," Timothy said. "Where I have a design I am confident about and want it stress-tested. I give it to the model most likely to disagree with me. Not because I expect it to be right and me to be wrong — but because the disagreement forces me to articulate my reasoning. And occasionally it finds something I missed."

"The adversarial reviewer," Margaret said.

"Yes. Which is only useful occasionally — you cannot stress-test everything, and the arguing model is slower for implementation. But for the decisions that matter most, the ones that will be load-bearing for months, the extra friction is worth it."

What This Requires of the Developer

"There is a cost to all of this," Timothy said. "I want to be honest about that. Using multiple models well requires knowing each of them. It requires discipline about which task goes to which model. It requires resisting the convenience of the familiar — the pull toward using the one you know for everything, simply because it is easier."

"Yes," Margaret said. "And it requires something else — the willingness to hold the models' outputs lightly. To remember that each one is a perspective, not a verdict. That the agreement of two models does not make something true, and the disagreement of two models does not mean one is wrong." She looked at him. "The multi-model workflow is powerful precisely because it gives you multiple perspectives. But perspectives require judgment to synthesize. The judgment is yours."

"The models are the instruments," Timothy said. "The developer is the musician."

"Yes," Margaret said. "And a musician who knows only one instrument is limited — not by lack of skill, but by the range of sound available. The multi-model developer has a wider range. But the music still depends on the musician."

Timothy sat with this for a moment, the three slim volumes on the table between them, the snow still falling outside, the library warm and certain.

"I've been thinking about something," he said. "The series of conversations we've had — the disciplines we've discussed. Context, planning, small tasks, the junior's curiosity, the senior's judgment, debugging, testing, documentation, legacy systems, security." He paused. "They all come back to the same thing, don't they. The tool is capable. The discipline is the developer's."

Margaret looked at him steadily. "Yes," she said. "That is the thread. Claude Code — any model, any tool — extends what is possible. It does not determine what is wise. The wisdom is what you bring." She paused. "And the wisdom is not static. It grows with practice, with attention, with the willingness to examine what went wrong and understand why." She looked at the three volumes. "You have been practicing."

"I have," Timothy said. "I am."

There Is Still More to Discuss

Timothy repacked the three volumes into his satchel with the care of someone returning valuable things to their proper place. The snow outside had thickened, the city beyond the windows now entirely white and quiet.

"Same time Thursday?" he said.

"I'll be here," Margaret said. "There is still more to discuss."

He smiled at this — a real smile, the smile of someone who has come to value the conversation for its own sake — and walked to the tall doors, pushing out into the white silence of the city.

Behind him the library settled into its deep winter quiet, the fire banking slowly, the three empty spaces on the table where the volumes had been already filling with lamplight and shadow.

Next episode: Margaret and Timothy turn to the sustainable practice — what it means to work with AI over the long term, how the discipline compounds, and what the developer who has learned to work well with Claude Code looks like a year from now.

Aaron Rose is a software engineer and technology writer at tech-reader.blog.

Catch up on the latest explainer videos, podcasts, and industry discussions below.

Search This Blog

Tech-Reader.blog