The Secret Life of Claude: When Claude Code Gets It Wrong

Three ways Claude Code gets it wrong — and the discipline that catches all of them before they ship

#ClaudeCode #CodingWithAI #SoftwareEngineering #AITools

Margaret is a senior software engineer. Timothy is her junior colleague. They work in a grand Victorian library in London — the kind of place where precision matters and confidence is not the same thing as correctness. Timothy has arrived today in unusually good spirits. This, Margaret has learned, is sometimes cause for concern.

What Timothy Was Proud Of

He came through the door with the particular energy of someone who had solved something.

"I fixed it," he said, settling into his chair with the satisfaction of a man who had earned his tea. "The user authentication bug. The one that's been sitting in the backlog for two weeks."

"Tell me," Margaret said.

"I described the problem to Claude Code — properly this time, the way we talked about. Context, specific behaviour, constraints. It gave me a solution, I reviewed it, it looked right, I tested it, it passed. Deployed this morning."

"And it worked?"

"Perfectly. Users can log in, sessions are handled correctly, no errors in the logs." He opened his laptop. "I wanted to show you the code."

Margaret looked at it. She read slowly, as she always did — not skimming, not nodding along, but actually reading. Timothy had learned not to fill this silence.

After a moment she said, "Walk me through the session expiry logic."

"It resets the session token on each login. Clears the old one, generates a new one, stores it with a timestamp."

"And when does the session expire?"

"Twenty-four hours after the timestamp."

"After which timestamp?"

Timothy looked at the screen. Then he looked at it more carefully.

"After the..." He stopped.

Margaret waited.

"After the creation timestamp," he said slowly. "Not the last activity timestamp."

"No."

The good spirits left the room quietly, the way warmth does when a window is opened in winter.

"So a user who logs in and stays active," Timothy said, "will still be logged out after twenty-four hours regardless."

"Yes."

"Even if they were active thirty seconds ago."

"Yes."

He sat back. "That's going to be a terrible experience."

"It is going to be a very frustrating experience," Margaret agreed, "for any user who is in the middle of something when the clock runs out." She looked at him steadily. "The code is not broken, Timothy. It does exactly what it was written to do. It simply does not do what you needed it to do."

The Shape of the Problem

He stared at the screen for a moment. "But I tested it."

"What did you test?"

"I logged in. The session worked. I logged out, logged back in. New token, no errors."

"Did you test a session that had been active for twenty-three hours and fifty-nine minutes?"

"No," he admitted. "Obviously not."

"Did you test what happens to an active user when their session expires mid-task?"

"No."

"Did you consider that the requirement might be sliding expiry — resetting the clock on activity — rather than fixed expiry from login?"

A pause. "I didn't think about it in those terms."

"Claude Code did not think about it in those terms either," Margaret said. "Because you did not ask it to. You described a bug — sessions not being cleared correctly — and it fixed that bug. Cleanly and correctly. But the behaviour you actually needed was never specified, so it was never built."

She picked up her pen.

"This is the shape of the problem. Claude Code gave you a confident, well-structured, functional solution to the problem you described. It was not the solution to the problem you had."

Confidence Is Not a Signal

"That's what bothers me most," Timothy said. "The code looked right. It read like something I would have written on a good day — clean, commented, sensible variable names. There was nothing in it that said this might not be what you need."

"There never is," Margaret said. "This is the important thing to understand. Claude Code cannot reliably flag gaps in your requirements. When something is ambiguous or unspecified, it does not stop and ask — it fills the gap with a reasonable assumption and continues. It will tell you if a function does not exist, or if a pattern is deprecated, or if there is a type mismatch. But it cannot tell you if the behaviour it has implemented matches the behaviour your users actually need. That knowledge lives only in your head. And in the heads of your users."

"So the confidence of the output is meaningless."

"The confidence of the output is a measure of technical coherence," Margaret said carefully. "The code is consistent. The logic follows. The syntax is correct. Those things Claude Code can assess, and it is very good at them. But correctness in the deeper sense — does this do what the business needs, does this serve the user well, does this handle the edge cases that matter — that is yours to verify. Always."

Timothy looked at the code again with different eyes. It had not changed. But he had.

"It's like getting directions to the wrong address," he said. "Delivered with complete confidence."

"That is an excellent description," Margaret said. "And the directions may be perfect. Every turn correct. Arrive precisely where you were told to go. The question is whether that was where you needed to be."

The Three Failure Modes

She drew three short lines on her notepad, the way she sometimes did when organising a thought.

"When Claude Code gets it wrong," she said, "it tends to do so in one of three ways. It is worth knowing all three."

First — the wrong problem. This is what happened today. You described a problem, Claude Code solved that problem, but the problem you described was not the whole problem. The output is technically correct and functionally incomplete. This is the most common failure mode, and the most insidious, because the code works. It just does not work enough.

Timothy was writing.

Second — the outdated answer. Claude Code's knowledge has a boundary in time. For well-established patterns it is reliable. But for libraries that move quickly, for APIs that change, for security practices that have evolved — it may give you an answer that was correct some time ago and is subtly wrong today. A hashing algorithm that was once acceptable. An authentication library that has since been superseded. A configuration pattern that newer versions have deprecated. The code will look reasonable. It will often run. And it will carry a flaw you did not know to look for. Always check the version. Always verify against current documentation for anything where recency matters.

"And third?"

Third — the plausible fabrication. Occasionally — not often, but occasionally — Claude Code will produce something that looks authoritative but is simply invented. A function that does not exist. A parameter that was never part of the API. A configuration option that sounds reasonable but cannot be found in any documentation. It will not announce this. It will present the fabrication with the same tone as everything else.

Timothy looked up. "How do you catch that one?"

"You run the code," Margaret said. "And you read the error message carefully. A plausible fabrication almost always fails at runtime, because the thing it invented does not exist in the actual system. The code looks right. Then it breaks in a very specific way — a missing attribute, an unknown method, a module that cannot be found." She looked at him. "When you see an error that suggests something simply does not exist — resist the urge to assume you have made a mistake. Consider that the solution may have invented something."

What Review Actually Means

"I did review the code," Timothy said. There was a defensiveness in it — not hostile, but genuine. He wanted to understand where he had gone wrong.

"I know you did," Margaret said, without any edge. "And the review was not useless. You would have caught a syntax error, a missing import, an obvious logic flaw. Code review has real value." She paused. "But there is a difference between reviewing code for technical correctness and reviewing code for behavioural completeness. Most developers only do the first. It is what we were trained to do — read the code, understand the code, verify the code does what it appears to do."

"But not whether it does what it should do."

"Exactly. That second question requires you to step outside the code entirely and ask: what are the real requirements here? Not what did I ask Claude Code to build. What does this feature actually need to do for a real user in a real situation?" She set down her pen. "Write those requirements down before you look at the solution. Even briefly. Even just three or four bullet points. Then compare what you wrote to what was built."

"I didn't write any requirements."

"No. You had them in your head — loosely — and you assumed the output would match them. This is a very human thing to do. We are pattern-matchers. The code looked like the right shape, so we accepted it as the right answer."

Timothy was quiet for a moment. Outside, the sounds of the city continued without interest in the conversation.

"If I had written down that sessions should expire on inactivity," he said, "I would have caught it immediately."

"Before you sent the prompt," Margaret said. "And the prompt itself would have been better — because it would have contained the actual requirement." She allowed a small pause. "The preparation we discussed last time protects you here as well. A precise prompt is harder to misanswer. A vague prompt can be answered correctly in ways that do not help you at all."

Before He Closed the Laptop

"I need to fix this today," Timothy said.

"You do," Margaret agreed. "It is a minor fix — update the expiry logic to track last activity rather than creation time. Claude Code will handle it well if you describe the requirement precisely."

"And this time I know what the requirement actually is."

"Yes. That is the difference." She looked at him. "Do not be discouraged by this, Timothy. What happened today is not a failure of your abilities. It is a failure of a habit you have not yet built — the habit of separating what you asked for from what you needed. That habit takes time. You are building it now."

He nodded slowly. "The code was good. The requirement was incomplete."

"And that distinction matters enormously," Margaret said. "Because it tells you where to look. Not at the tool. At the thinking before the tool."

She picked up her tea.

"Claude Code did not deceive you. It answered the question you asked. The discipline — and it is a discipline — is learning to ask questions complete enough that the answer cannot mislead you, even when it is technically correct."

He closed the laptop.

Outside, London was carrying on in its unhurried way. Inside the library, a developer had just learned something that could not be learned from documentation — the difference between code that works and code that is right.

They are not always the same thing.

And knowing that is half the work.

Next episode: Margaret and Timothy explore the art of the follow-up — what to do after Claude Code gives you something that is almost right, and how to guide the conversation toward what you actually need.

The Secret Life of Claude Code publishes on tech-reader.blog every other day.

If this series is useful to you, share it with a developer who needs to hear it.

Aaron Rose is a software engineer and technology writer at tech-reader.blog. For explainer videos and podcasts, check out Tech-Reader YouTube channel.

Search This Blog

Tech-Reader.blog