The On-Demand Factory: Yield and Lazy Evaluation
Timothy stared at his computer screen in dismay. The library had just digitized its complete archive—three million books—and his program to generate a catalog listing had frozen solid. Margaret found him frantically checking if the system had crashed.
"What happened?" she asked, noting his frustrated expression.
"I tried to create a list of all titles for the annual report," Timothy explained. "But loading three million book records into memory at once…" He gestured at the unresponsive screen.
Margaret smiled knowingly and led him to a section of the Function Modification Station labeled "The On-Demand Factory," where a series of steam-powered conveyor belts moved individual items rather than hauling entire collections at once.
The Memory Crisis
Timothy's original approach seemed straightforward:
def get_all_titles():
titles = []
for book in database.fetch_all_books():
titles.append(book.title)
return titles
# Try to get all three million titles
catalog = get_all_titles() # System freezes...
The function loaded every single book into memory before returning anything. For three million records, this consumed gigabytes of RAM and took minutes to complete—if it completed at all.
"What if," Margaret suggested, "you didn't create the entire list upfront? What if you produced titles one at a time, only when needed?"
The Yield Keyword
Margaret showed Timothy a different approach:
def get_all_titles():
for book in database.fetch_all_books():
yield book.title
Timothy blinked. The function looked almost identical, but yield
had replaced append
and return
. "What does yield
do?"
"It transforms your function into a generator," Margaret explained. "Instead of computing all results and returning them at once, it produces one result at a time, pausing between each."
title_generator = get_all_titles()
# Nothing has executed yet!
print(type(title_generator)) # <class 'generator'>
# Get the first title
first_title = next(title_generator) # NOW the function runs until the first yield
# Get the second title
second_title = next(title_generator) # Function resumes and runs to the next yield
When Python encountered yield
, it paused the function and returned the yielded value. The next call to next()
resumed execution right where it left off, continuing until the next yield
.
The Conveyor Belt Metaphor
Timothy visualized the generator as a steam-powered conveyor belt. The regular function was like dumping three million books onto his desk all at once—overwhelming and impractical. The generator delivered one book at a time: examine it, process it, then request the next one.
def generate_catalog_entries():
for book in database.fetch_all_books():
entry = f"{book.title} by {book.author} ({book.year})"
yield entry
# Process one entry at a time
for entry in generate_catalog_entries():
print(entry) # Uses minimal memory
# Only one entry exists in memory at any moment
The for
loop automatically called next()
on the generator until it was exhausted.
The State Preservation Magic
Margaret demonstrated how generators maintained their state between yields:
def count_books_by_decade():
current_decade = None
count = 0
for book in database.fetch_all_books():
book_decade = (book.year // 10) * 10
if current_decade is None:
current_decade = book_decade
if book_decade != current_decade:
yield f"{current_decade}s: {count} books"
current_decade = book_decade
count = 1 # Start counting this book in the new decade
else:
count += 1
if count > 0:
yield f"{current_decade}s: {count} books"
for summary in count_books_by_decade():
print(summary)
# 1800s: 342 books
# 1810s: 567 books
# ...
The variables current_decade
and count
persisted across yields, maintaining the function's state. Each yield
was like pressing pause—the function froze with all its variables intact, resuming exactly where it stopped.
The Infinite Sequence Pattern
Timothy discovered generators could produce infinite sequences without consuming infinite memory:
def generate_catalog_ids():
catalog_id = 1
while True:
yield f"CAT-{catalog_id:06d}"
catalog_id += 1
id_generator = generate_catalog_ids()
# Get as many IDs as needed
print(next(id_generator)) # CAT-000001
print(next(id_generator)) # CAT-000002
print(next(id_generator)) # CAT-000003
The infinite loop never caused a problem because the generator only computed values when requested.
The Generator Exhaustion
Margaret warned Timothy about generator exhaustion:
def get_three_titles():
for book in database.fetch_all_books():
yield book.title
title_gen = get_three_titles()
# Consume all titles
all_titles = list(title_gen) # Generator is now exhausted
# Try to iterate again
for title in title_gen:
print(title) # Nothing prints - generator is empty
# What happens if we call next() directly?
next(title_gen) # Raises StopIteration exception
Generators could only be iterated once. When exhausted, calling next()
raised a StopIteration
exception—this is how Python's for
loops knew when to stop. After exhaustion, they produced no more values. To iterate again, Timothy needed to create a new generator.
The Memory Efficiency Demonstration
Timothy measured the difference:
import sys
# List approach - all in memory
def titles_as_list():
return [book.title for book in database.fetch_all_books()]
# Generator approach - one at a time
def titles_as_generator():
for book in database.fetch_all_books():
yield book.title
title_list = titles_as_list()
print(sys.getsizeof(title_list)) # 8000064 bytes - just the list structure
title_gen = titles_as_generator()
print(sys.getsizeof(title_gen)) # 112 bytes - the generator object itself
The generator object was tiny because it didn't store results—it computed them on demand. The list consumed memory for the entire structure plus all the title strings it referenced. With three million titles, the difference between loading everything versus producing items one-by-one was dramatic.
Advanced: The Send and Close Methods
Margaret revealed advanced generator control methods for special cases:
def search_catalog(query):
for book in database.fetch_all_books():
if query.lower() in book.title.lower():
# Receive a new query mid-iteration
new_query = yield book.title
if new_query:
query = new_query
searcher = search_catalog("python")
# Prime the generator with the first next() call
print(next(searcher)) # First python book
# Now we can send a new value
print(searcher.send("ruby")) # Changes search, yields first ruby book
# Can also close it early
searcher.close() # Terminates the generator
The send()
method both sent a value into the generator (which became the result of the yield
expression) and advanced it to the next yield
. The generator needed to be "primed" with an initial next()
call before send()
would work. The close()
method terminated the generator early, useful for cleanup.
"These methods are rarely needed," Margaret cautioned. "Most generator work uses simple iteration. But they exist for specialized control flow."
The Practical Applications
Timothy compiled common generator patterns:
Reading large files line by line:
def read_large_log_file(filename):
with open(filename) as file:
for line in file:
yield line.strip()
Filtering sequences:
def recent_books_only(all_books):
for book in all_books:
if book.year >= 2020:
yield book
Transforming data:
def uppercase_titles(books):
for book in books:
yield book.title.upper()
Timothy's Generator Wisdom
Through exploring the On-Demand Factory, Timothy learned essential principles:
Yield transforms functions into generators: Replace return
with yield
to produce values one at a time.
Generators are lazy: They compute values only when requested, not all upfront.
State persists between yields: Variables maintain their values across pauses.
Memory efficiency is the superpower: Process millions of items using minimal memory.
Generators exhaust: They can only be iterated once; create a new generator to iterate again.
Use for large datasets: When processing all items at once is impractical or impossible.
Lists for small collections: For a dozen or hundred items, regular lists work fine and offer more flexibility. Generators shine when dealing with thousands or millions of items, or infinite sequences.
Timothy's mastery of generators revealed Python's elegant solution to the memory crisis: don't load everything at once—produce items on demand. The On-Demand Factory transformed overwhelming batch operations into manageable streams, one value at a time. The steam-powered conveyor belt delivered exactly what he needed, exactly when he needed it.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Comments
Post a Comment