Thomas Bayes and the Theorem in the Drawer

How a quiet minister's forgotten idea became the engine inside every AI

#Mathematics #Probability #Bayes #BayesianThinking

Margaret is a senior software engineer. Timothy is her junior colleague. They work in a grand Victorian library in London — where every question deserves a careful answer, and curiosity is always welcome.

Margaret was reshelving when Timothy arrived.

She did this herself, always, despite the library having staff for precisely that purpose. She said it kept her familiar with what was there. Timothy suspected it also kept her away from her desk when she did not wish to be interrupted, and so he waited near the reading table, coat still on, until she came around the corner with an empty trolley and the expression of someone who had been expecting him anyway.

"You have a question," she said. "Not a work question."

"A mathematics question." He sat down. "I've been thinking about probability. About how we make decisions when we don't have complete information. And every time I read about it, people mention something called Bayes' Theorem. They say it's fundamental. They say once you understand it you can't unsee it." He paused. "I don't understand it."

Margaret set the trolley against the wall. "How much probability do you know?"

"Enough to be confused by Bayes specifically."

She pulled out the chair across from him and sat down. "Then let me tell you about a man named Thomas Bayes. Because the theorem is easier to understand if you understand why he wrote it down — and easier still if you know that he never meant for anyone to see it."

The Man Who Put His Ideas in a Drawer

"Thomas Bayes was born in 1702," Margaret said. "He was a nonconformist minister in Tunbridge Wells — a quiet, scholarly man who spent most of his life doing what clergymen of that era did. Tending his congregation. Reading. Thinking. Writing things down that he did not publish."

"Why not publish?"

"He published almost nothing. Two works in his lifetime, one of them anonymous. He was, by all accounts, a man who thought deeply and shared cautiously." She folded her hands on the table. "What he left behind — when he died in 1761 — was a collection of papers. Among them was an unpublished essay. His friend Richard Price found it, recognised it as significant, edited it, and sent it to the Royal Society two years after Bayes was dead."

Timothy was quiet for a moment. "So he never knew."

"He never knew. The idea that now carries his name — that sits at the foundation of modern statistics, of medical diagnosis, of spam filters, of the reasoning that shaped the language model you used this morning — was sitting in a drawer. Waiting." Margaret looked at him. "That is who we are talking about. A modest man with a big idea he apparently wasn't sure the world was ready for."

What the Theorem Actually Says

"Now," Margaret said, "the theorem itself."

She pulled a blank sheet of paper toward her. "I am not going to write the formula yet. The formula is not where you start. You start with the question the formula answers."

She wrote one sentence in the centre of the page:

When you learn something new, how much should you change your mind?

"That is Bayes' question," she said. "Not how likely is something. Not what are the odds. But — given what you already believed, and given this new piece of evidence, what should you believe now?"

Timothy read the sentence. "That sounds like a human question, not a mathematics question."

"It is both. That is precisely what made it powerful and what made it controversial." She set down her pen. "Before Bayes, probability was mostly about frequency. You flip a coin a thousand times, you get five hundred heads. Probability is what happens when you repeat an event many times. Objective. Measurable."

"And Bayes was doing something different."

"Bayes was saying that probability can also describe a degree of belief. Not just how often something happens — but how confident you are that something is true. And that this confidence should be updated, systematically, every time you encounter new evidence." She tapped the sentence on the page. "That is a very different claim. And for about two hundred years, a significant portion of the statistical community thought it was wrong, or at least philosophically suspect."

"But it wasn't."

"It was not. It simply wasn't fashionable." She almost smiled. "Thomas Bayes was ahead of fashion by approximately two centuries."

A Walk Through the Idea

"Let me give you the mechanism without the notation first," Margaret said. "You can get the notation later. The notation is just compression — shorthand for an idea you already understand."

She turned the paper over. "Suppose it's Wednesday, and you are planning a picnic for Saturday. You think to yourself — what are the chances of rain?"

"Alright."

"Before you check anything — before you look at a forecast, before you glance at the sky — you already have a belief. At this time of year, perhaps one Saturday in five brings significant rain. So your starting probability of a ruined picnic is — what?"

"Twenty percent, roughly."

"Good. That is what Bayes calls your prior. Your belief before new evidence." She wrote the word on the page. "Now it is Friday evening. You step outside and the sky is heavy and overcast. You know that when rain actually does arrive, skies like this appear the evening before — perhaps eighty percent of the time. You also know that overcast evenings happen for other reasons — a passing front that clears overnight, say — perhaps thirty percent of dry Saturdays follow a cloudy Friday."

Timothy was nodding slowly. "So the clouds — the new evidence — change the picture."

"They change it precisely. Bayes' theorem tells you exactly how much to update." She wrote three words beneath prior:

Prior. Evidence. Posterior.

"The posterior," she said, "is your new belief, after incorporating the evidence. Not a guess. Not an instinct. A calculation. You began believing there was a twenty percent chance of rain. You have now seen an overcast sky. Bayes tells you your new probability — something closer to forty or forty-five percent. Still not certain. But meaningfully higher than where you started."

Timothy blinked. "So the clouds nearly double my estimate?"

"Precisely. That is what a single piece of evidence can do when you apply it correctly." She looked at him. "And if you then wake Saturday morning and hear thunder — "

"You update again."

"Each piece of evidence shifts the posterior, which becomes the new prior for the next update. That is the engine. Evidence arrives. Beliefs update. Rinse and repeat."

Why This Is Everywhere

"When does this actually get used?" Timothy asked. He had taken off his coat, which Margaret had learned meant he had stopped planning to leave.

"Everywhere you look once you know what you're looking for," she said. "Your email provider uses it — or a descendant of it — to decide whether a message is spam. It begins with a prior probability that a given email is spam, then updates based on features: the words used, the sender, the formatting. Each feature is evidence. The posterior is the spam probability it assigns."

"The language model."

"Yes. At every step, the model is maintaining something like a probability distribution over what word comes next. It updates that distribution based on everything it has seen so far in the conversation. Prior, evidence, posterior, prior, evidence, posterior — thousands of times per sentence."

"Medical diagnosis."

"Explicitly so in some systems. You arrive with symptoms. The system has a prior — base rates of diseases in the population. Each symptom updates the probabilities. A good diagnostic AI is doing something very close to Bayesian reasoning, whether it says so or not."

Timothy looked at the three words on the paper. "And Bayes worked this out in the eighteenth century."

"He worked out the core mechanism. He was thinking about a much simpler problem — a billiard table, as it happens, in one of his thought experiments. Rolling a ball and trying to infer where it stopped based on limited observations." She tilted her head slightly. "He was asking: if I cannot see the answer directly, how do I reason backwards from what I can see to what I cannot?"

"Reasoning backwards."

"That is sometimes how it's described. Reasoning from effects to causes. From evidence to the most likely explanation. Which is, when you think about it, a description of almost everything we do when we think carefully."

Why It Took So Long

"If it's so fundamental," Timothy said, "why did it take two hundred years to catch on?"

Margaret considered this. "Two reasons, I think. The first is philosophical. The classical statisticians were deeply uncomfortable with the idea that probability could represent subjective belief. It felt unscientific to them — as if you were smuggling opinion into mathematics. They wanted probability to be objective, grounded in repeated experiments. Bayesian probability felt too personal."

"And the second reason?"

"Computation." She leaned back slightly. "Bayes' theorem is conceptually elegant. But when you apply it to real problems — medical diagnosis with dozens of symptoms, language modelling with hundreds of thousands of words — the mathematics becomes very expensive very quickly. In the eighteenth century, in the nineteenth, even well into the twentieth, you simply could not do the calculations at the scale required to make it useful."

"Until computers."

"Until computers. Specifically, until computers became fast enough to run what are called Markov Chain Monte Carlo methods — ways of approximating Bayesian calculations that would otherwise be intractable." She paused. "Thomas Bayes wrote his theorem by candlelight with a quill. It was waiting for machines that would not exist for another two hundred and fifty years before it could be fully used."

Timothy sat with that for a moment. "He wrote something the world wasn't capable of using yet."

"He wrote something the world wasn't ready to build yet. The idea was ready. The infrastructure was not." She looked at the paper between them. "That happens, with foundational ideas. They arrive before their moment. Someone sees them, puts them in a drawer — or a journal, or a footnote — and they wait. Richard Price found Bayes' essay and understood it was important, even if he could not have guessed quite how important. He sent it to the Royal Society. And the idea began its very long journey toward the inside of every AI model on earth."

The Lesson at the Edge of the Table

Timothy looked at the three words again. Prior. Evidence. Posterior.

"What's the thing to take away from this?" he asked. "If someone asked me to explain Bayesian thinking in a sentence?"

Margaret considered for a moment. "Don't start from scratch," she said. "You already believe things. When new evidence arrives, update what you believe in proportion to what the evidence actually tells you — not more, not less. That is the whole discipline."

"And the mistake people make?"

"Two mistakes, usually. The first is ignoring the prior — treating every piece of evidence as if you had no beliefs before it arrived. This leads to overreacting to individual data points. One bad result convinces you everything is ruined. One good day convinces you the problem is solved." She looked at him. "The prior is there to anchor you."

"And the second mistake?"

"Refusing to update. Holding your prior beliefs regardless of evidence." She picked up her pen. "Bayes was not suggesting that you should change your mind constantly. He was suggesting that you should change your mind correctly — proportionally, systematically, honestly." She set the pen down. "Which is, as it turns out, much harder than either extreme."

Timothy stood, folding the sheet of paper carefully. He paused at the edge of the table. "Margaret — what do you think Bayes would have made of it? The spam filters. The language models."

She thought about this longer than he expected.

"I think he would have been quietly astonished," she said. "He was a minister. He thought carefully about how we form beliefs, how we reason under uncertainty, what it means to update your faith in a proposition." She glanced toward the shelves. "I suspect he would have found it entirely natural that the same structure applies to machines. He spent his life watching people reason badly and wondering if there were a better way." She opened her book. "He found one."

Timothy left without buttoning his coat.

Next episode: The number that broke mathematics — and the man it broke with it. A story about infinity, a hotel with infinite rooms, and why Georg Cantor's ideas were called a "mathematical illness" by the colleagues who feared them most.

Aaron Rose is a software engineer and technology writer at tech-reader.blog.

Catch up on the latest explainer videos, podcasts, and industry discussions below.

Search This Blog

Tech-Reader.blog