Python's Mutable Default Mistake
Problem
You write a helper function with a default list parameter. It works perfectly—the first time. The second time? Your "empty" list already has data in it. Welcome to Python's most notorious gotcha.
def collect(item, items=[]):
items.append(item)
return items
print(collect("apple")) # ['apple']
print(collect("banana")) # ['apple', 'banana'] # The "apple" is still there!
This isn't a feature—it's Python's most reliable way to create bugs that only appear in production.
Clarifying the Issue
The problem is rooted in when Python evaluates default arguments. They are evaluated only once—at function definition time. If that default is a mutable object (like a list, dictionary, or set), the function holds a reference to that single, persistent object. Every call that uses the default manipulates this same object, causing hidden state leakage.
Why It Matters
This bug is insidious. It often slips through testing because the function works perfectly when called once. The chaos emerges in production:
- A logging utility mixes messages from different requests
- A cache function returns data meant for another user
- Unit tests pass individually but fail when the suite runs
Note: Experienced developers occasionally use mutable defaults intentionally for caching or singleton patterns. These cases are rare and should be clearly documented. For 99% of code, mutable defaults are a bug waiting to happen.
Key Terms
- Mutable Object: A Python object that can be changed after creation (list, dict, set).
- Immutable Object: An object that cannot be changed after creation (int, str, tuple).
- Function Definition Time: The moment Python loads the function into memory; defaults are locked in here.
- State Leakage: Data from one call persisting into future calls unexpectedly.
Steps at a Glance
- Identify functions that use mutable defaults.
- Replace the default with
None
. - Initialize the mutable object inside the function body.
- Test to confirm each call starts with a fresh object.
- Add linting or review practices to catch this early.
Detailed Steps
Step 1: Identify the Problem
Scan your code for function signatures that use mutable objects (lists, dicts, sets) as default values.
def build_record(key, cache={}): # 🚨 dangerous
cache[key] = True
return cache
Step 2: Replace with None
Replace the mutable default with None
and add a conditional check inside the function.
def build_record(key, cache=None):
if cache is None:
cache = {}
cache[key] = True
return cache
Step 3: Confirm Behavior
Test the function to verify that each call now gets a fresh, independent object.
print(build_record("a")) # {'a': True}
print(build_record("b")) # {'b': True} # Each call starts fresh
Step 4: Use Tools to Enforce
Integrate a linter like flake8-bugbear
into your workflow. It will flag this anti-pattern with rule B006 ("Do not use mutable defaults"), preventing it from entering your codebase.
Step 5: Make It Standard Practice
Add this to your code review checklist:
- Any function with list/dict/set defaults? Flag it.
- Run
flake8-bugbear
in CI/CD pipelines - When onboarding new team members, show them this pattern first
After a few weeks, if x is None: x = []
becomes automatic.
Bad vs. Good Example
Bad (Leaky State):
def collect(item, items=[]):
items.append(item)
return items
print(collect("apple")) # ['apple']
print(collect("banana")) # ['apple', 'banana']
Good (Fresh State):
def collect(item, items=None):
if items is None:
items = []
items.append(item)
return items
print(collect("apple")) # ['apple']
print(collect("banana")) # ['banana']
Conclusion
The fix takes five seconds: replace items=[]
with items=None
, add two lines inside the function. The payoff? You've eliminated an entire class of subtle bugs. Add a linter, make it muscle memory, and you'll never debug this at 2 AM again.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Comments
Post a Comment