The Secret Life of Python: deepcopy

 

The Secret Life of Python: deepcopy

Taking control of Python's __deepcopy__

#Python #Coding #Programming #SoftwareDevelopment




Margaret is a senior software engineer. Timothy is her junior colleague. They work in a grand Victorian library in London — the kind of place where code quality is the unspoken objective, and craftsmanship is the only thing that matters.

Episode 27

Timothy was looking at a new class he had built for the Chess Club—a TournamentSession object. It was a complex piece of data that tracked the start time of a match and a unique "Session ID."

"Margaret," Timothy said, "I'm using deepcopy to create a backup of a match in progress. It works perfectly now, but I have a weird problem. The 'copy' has the exact same start_time and session_id as the original. If I'm making a new version of the match, shouldn't it have its own unique ID and a new timestamp?"

Margaret smiled. "You've discovered the 'Perfect Clone' problem. Usually, a perfect clone is what we want. But sometimes, you need a Specialist to step in and tell Python: 'Copy everything, but change these specific things on the way out.'"


Taking the Controls

Margaret drew a picture of a factory assembly line on the whiteboard. In the middle of the line was a robot arm labeled __deepcopy__.

"When you call copy.deepcopy(), Python looks for a special method inside your class called __deepcopy__," Margaret explained. "If it doesn't find one, it just does its usual routine. But if you define it, you are taking over the controls of the clone machine."

"Let's look at how you can tell Python to refresh the ID during a copy," she said, writing on the board:

import copy
import time
import uuid

class TournamentSession:
    def __init__(self, player_a, player_b):
        self.player_a = player_a
        self.player_b = player_b
        self.session_id = uuid.uuid4()
        self.start_time = time.time()

    def __deepcopy__(self, memo):
        # 1. Create a new instance without calling __init__ again
        cls = self.__class__
        new_session = cls.__new__(cls)
        
        # 2. Add it to the 'memo' (to prevent infinite loops)
        memo[id(self)] = new_session
        
        # 3. Copy the players normally using the standard deepcopy
        new_session.player_a = copy.deepcopy(self.player_a, memo)
        new_session.player_b = copy.deepcopy(self.player_b, memo)
        
        # 4. BUT! Give the copy a fresh ID and current time
        new_session.session_id = uuid.uuid4()
        new_session.start_time = time.time()
        
        return new_session

The Memo Trick

Timothy pointed to the word memo in the code. "What is that?"

"That's Python's 'Don't Repeat Yourself' list," Margaret said kindly. "When you copy complex data, you might have objects that point to each other in a circle. The memo dictionary keeps track of everything we've already copied. If Python sees an object it recognizes in the memo, it just grabs the copy instead of starting over. It prevents your program from spinning in a circle forever."


The Power of the Specialist

"So," Timothy summarized, "by writing this one method, I can let the deepcopy library do 90% of the heavy lifting, while I just 'tweak' the specific parts that need to be unique?"

"Exactly," Margaret nodded. "You are no longer just a user of the library. You are its partner. You can do the same thing with __reduce__ for pickling, or __setstate__ for when an object is being 'rehydrated' from a file. You are giving your objects a memory and a set of instructions on how they should be born into the world."

Timothy looked at his new TournamentSession. When he made a copy, the players stayed the same, but the session_id blinked into a brand-new, unique string.

"It's not just a copy anymore," he said. "It's a descendant."


Margaret's Cheat Sheet: Customizing the Clone

Which Tool Should You Use?

  • Default deepcopy — Use this when you want a 100% identical twin of the object. Python handles everything automatically.

  • Custom __deepcopy__ — Use this when the copy needs unique fields like fresh IDs, new timestamps, or cleared logs. You let Python copy most of the object, but you step in to "tweak" the parts that shouldn't be identical.

  • Custom __copy__ — Use this to control shallow copies made via copy.copy(). It follows the same pattern as __deepcopy__, but only handles the top-level object without recursing into nested structures.

  • __reduce__ — Use this to control how an object is "serialized" (pickled) for storage or transmission. It's especially useful for handling objects that contain unpicklable resources—like file handles or database connections—by saving the instructions to recreate them instead of the objects themselves.

The Specialist's Pattern

  1. Create the new object instance using cls.__new__(cls). This creates an empty, uninitialized instance so you can control exactly what gets copied.

  2. Register it in the memo dictionary. This prevents infinite loops if your data contains circular references.

  3. Deepcopy the attributes you want to keep using copy.deepcopy(attr, memo). This hands the work back to Python for the parts you want to preserve.

  4. Manually set the specific attributes you want to change—like generating a fresh session_id or a new start_time.

Engineering Judgment

If you only need to change one or two values occasionally, a simple .refresh() or .clone() method is often cleaner and easier to understand. Reach for "The Specialist" only when you want this custom behavior to be the global standard—so that anyone using Python's copy module with your class gets the customized behavior automatically.


Aaron Rose is a software engineer and technology writer at tech-reader.blog

Catch up on the latest explainer videos, podcasts, and industry discussions below.


Comments

Popular posts from this blog

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison