The Secret Life of Python: Reusing Memory with Integer Caching and String Interning

 

The Secret Life of Python: Reusing Memory with Integer Caching and String Interning

Understanding object identity, the id() function, and the is vs == trap

#Python #MemoryManagement #IntegerCaching #StringInterning




Margaret is a senior software engineer. Timothy is her junior colleague. They work in a grand Victorian library in London — the kind of place where code quality is the unspoken objective, and craftsmanship is the only thing that matters.

Episode 2

Timothy stared at the fresh notebook Margaret had given him. On his screen, the cursor blinked patiently beneath the output of his first program. He cleared the terminal and typed a new set of lines, eager to show he understood the concept of an object.

x = "Hello World"
y = "Hello World"

"Look at this, Margaret," Timothy said, pointing at the screen. "I created two distinct string objects. They look identical, they contain the same characters, but they are living in two different locations on the heap, right?"

How Python Treats Memory

Margaret leaned over his shoulder, her gaze fixing on the variables. "Are they, Timothy? Or are you assuming Python treats memory the way a human treats a printing press?"

Timothy frowned. "They are two separate assignments. Why would they be the same?"

"Because Python is lazy, and memory is expensive," Margaret said quietly. "Open your interactive shell. Let us look at their fingerprints."

She reached past him and typed rapidly:

print(id(x))
print(id(y))

Timothy watched as two identical, long strings of integers filled the screen.

140532789234800
140532789234800

"The exact same number," Timothy whispered, his brow furrowing. "But how? I wrote the literal string twice."

Python Interns the String

"You did," Margaret replied, stepping back toward her ledger. "But the CPython interpreter recognized that the string literal was immutable. Instead of carving out two separate blocks of memory on the heap, it created the object once and pointed both x and y to the exact same memory address. It interned the string. Watch what happens when we switch to integers."

She tapped the keys again, setting up a side-by-side comparison.

a = 256
b = 256
print(a is b)  # True

a = 257
b = 257
print(a is b)  # False

Timothy blinked. "Wait. Why does 256 return True but 257 returns False? They are just numbers!"

Inside and Outside the Cache

"Ah, but to the interpreter, they are not just numbers," Margaret said, a sharp smile touching her lips. "CPython pre-allocates a global array for small integers from -5 to 256 the moment it boots up. When you ask for 256, you are handed a recycled object from that cache. But 257 lies outside the safety boundary. For 257, Python is forced to carve out entirely new memory blocks."

She picked up a chalk stub and drew a line on the small slate board by the desk.

"You are looking at two different name tags pinned to the exact same individual, Timothy. In Python, variables do not hold values; they are merely references pointing to objects. And until you understand id(), you do not truly know who you are talking to."


Margaret’s Cheat Sheet: The Identity of an Object

  • The Identity Function: The built-in id() function returns the actual memory address of an object in the standard CPython implementation. It is the ultimate truth-teller in Python; values can lie, but id() cannot.

  • The is Operator vs. ==:

  • The == operator checks for equality (Do these two objects contain the same data?).

  • The is operator checks for identity (Are these two variables pointing to the exact same id() address?).

  • String Interning and Integer Caching: Python optimizes memory by automatically reusing small, immutable objects. CPython pre-allocates global objects for integers between -5 and 256. Strings that look like valid identifiers are also automatically interned at compile time, though you can manually optimize performance for dynamic strings using sys.intern().

  • Why This Matters: Blindly using is instead of == for value comparison introduces silent, catastrophic production bugs. Your code might pass local testing perfectly because your IDs match under 256, only to fail spectacularly in production when real-world data crosses the caching threshold.


Aaron Rose is a software engineer and technology writer at tech-reader.blog

Catch up on the latest explainer videos, podcasts, and industry discussions below.


Popular posts from this blog

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

Running AI Models on Raspberry Pi 5 (8GB RAM): What Works and What Doesn't

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison