The Silent Bug: How My Test Hid a Floating Point Error

 

The Silent Bug: How My Test Hid a Floating Point Error

How questioning my own tests caught a $600 bug before it became a $60,000 problem





Hi, I'm Lily Chen.

Six months into my role as a software engineer at PayStream, a payment processing startup, I was about to deploy a refactor that would handle our entire transaction reconciliation system. My manager, Jake, had already signed off. QA had run their regression suite. All tests green.

I was ready to deploy Wednesday morning when something in my test file made me stop scrolling.

The Feature

Our reconciliation system processes thousands of payments daily—credit card transactions, ACH transfers, refunds. At the end of each day, we need to match every penny between what our system recorded and what actually moved through the banking networks.

I'd been tasked with refactoring the reconciliation logic to handle a new payment provider. The math was straightforward: sum up transactions, compare totals, flag any discrepancies over $0.01.

My code looked clean:

class ReconciliationEngine:
    def __init__(self):
        self.total_processed = 0.0
        self.total_expected = 0.0

    def add_transaction(self, amount):
        self.total_processed += amount

    def set_expected_total(self, amount):
        self.total_expected = amount

    def check_balance(self):
        difference = abs(self.total_processed - self.total_expected)
        return difference < 0.01  # Allow 1 cent tolerance

Tests passed. Code review approved. The refactor even improved performance by 15%.

But that Wednesday morning, I was reading through my test file one more time, and I saw this:

def test_large_transaction_reconciliation(self):
    engine = ReconciliationEngine()

    # Simulate a day of transactions
    for _ in range(10000):
        engine.add_transaction(4.99)

    expected = 10000 * 4.99
    engine.set_expected_total(expected)

    self.assertAlmostEqual(
        engine.total_processed, 
        engine.total_expected,
        places=2
    )

I stared at that assertAlmostEqual. My cursor hovered over it.

Why did I write it that way?

The Question

I'd written assertAlmostEqual instead of assertEqual throughout my test suite. At the time, it seemed reasonable—financial calculations might have tiny rounding differences, right? The places=2 parameter meant I was checking equality to two decimal places. For money, that's cents. Perfect.

Except... why would there be any difference?

What was I expecting to be imprecise? I was adding numbers. Python's addition operator should be exact. If I add 4.99 ten thousand times, I should get exactly 49,900.00.

Shouldn't I?

I opened a Python shell on my laptop:

>>> total = 0.0
>>> for _ in range(10000):
...     total += 4.99
... 
>>> total
49899.9999999918
>>> 10000 * 4.99
49900.0

My coffee went cold in my hand.

The sum was off by 0.0000000082. Eight billionths of a cent.

Across ten thousand transactions.

The Calculation

I grabbed my notebook and started doing math.

PayStream processes about 2 million transactions per month. Let's say average transaction size is $25.

I pulled up a Python shell and simulated our real production volumes:

>>> # Simulate one month of real volume
>>> total = 0.0
>>> for _ in range(2000000):
...     total += 25.47  # Average transaction amount
...
>>> expected = 2000000 * 25.47
>>> difference = expected - total
>>> difference
0.00018189894035458565

Okay. That's only 0.0002 cents of error. Almost nothing.

But then I thought about our business model. We don't just sum transactions—we calculate fees, process refunds, split payments to merchants, handle chargebacks. Every calculation type compounds these errors differently.

Our fee structure is 2.9% + $0.30 per transaction. I ran the numbers:

>>> total_fees = 0.0
>>> for _ in range(2000000):
...     transaction = 25.47
...     fee = (transaction * 0.029) + 0.30
...     total_fees += fee
...
>>> expected_fees = 2000000 * ((25.47 * 0.029) + 0.30)
>>> difference = expected_fees - total_fees
>>> difference
47.32814371585846

$47.32 per month. Just in fee calculations.

In a year? $567.84. And that's one calculation type. We run reconciliation across transactions, refunds, chargebacks, merchant payouts, and reserve calculations. Each one accumulating its own floating point errors.

Conservative estimate across all calculation types? Easily $5,000+ in annual phantom discrepancies.

And here's what would actually happen: our finance team would spend hours each month investigating these discrepancies—money that appeared and disappeared in the reports but couldn't be traced to actual transactions. They'd flag them for review. We'd open support tickets. We'd potentially hold merchant payouts while investigating.

We're a startup planning to 10x our transaction volume next year. Scale this to 20 million transactions per month? We'd be looking at $60,000+ in unexplainable variance annually, plus countless hours of investigative overhead.

The Fix

I called Jake immediately.

"We need to use Decimal for all financial calculations," I said. "Floats are accumulating rounding errors."

He was quiet for a moment. "Your tests passed."

"My tests used assertAlmostEqual. They were designed to ignore the exact problem we have."

I showed him the numbers. The projections. The compounding error rates across our calculation pipeline.

We spent the morning refactoring every financial calculation in the codebase to use Python's decimal module:

from decimal import Decimal, ROUND_HALF_UP

class ReconciliationEngine:
    def __init__(self):
        self.total_processed = Decimal('0.00')
        self.total_expected = Decimal('0.00')

    def add_transaction(self, amount_str):
        # Accept string to preserve exact precision
        self.total_processed += Decimal(amount_str)

The key was accepting amounts as strings from the start. Converting a float to Decimal doesn't help—you've already lost precision the moment that number became a float. You need to preserve exactness from the source.

I rewrote my tests with assertEqual. No more assertAlmostEqual. No more tolerance for error.

They all failed.

Then I fixed the code. They all passed.

Green across the board. Actually green, not "close enough" green.

What I Learned

The bug wasn't in my code. It was in my tests.

I'd written tests that accommodated the problem instead of catching it. assertAlmostEqual was a permission slip for imprecision. It let floating point errors hide in plain sight.

Now I follow one rule: question the tests, not just the code.

If your test uses assertAlmostEqual for financial data, ask why. If you're adding tolerance thresholds, ask what you're tolerating. If something is "close enough," ask if close is actually good enough.

The bugs that cost companies money aren't always in the logic. Sometimes they're in the math itself. Sometimes they're in the assumptions baked into your test suite.

And sometimes, the best debugging tool is just asking: Why did I write it this way?


Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite