A Tiny Python Script That Teaches Big Engineering Lessons
A Tiny Python Script That Teaches Big Engineering Lessons
A 15-line Python script that demonstrates normalization, separation of concerns, and the power of the standard library.
#Python #CodeReview #CleanCode #CodeQuality
🎧 Audio Edition: Prefer to listen? Check out the expanded AI podcast version of this deep dive on YouTube.
📺 Video Edition: Prefer to watch? Check out the 7-minute visual explainer on YouTube.
Most code examples online are either trivial or bloated.
This one sits in the middle.
It’s short. Runnable. Clear. But it quietly demonstrates several principles that matter far beyond word counting: normalization, separation of concerns, reuse of standard libraries, and structured output.
Let’s look at a small Python program that counts word frequency in a block of text — and unpack what it’s actually doing.
The Goal
Take a raw block of text and transform it into:
- Clean, normalized words
- A frequency tally
- A sorted, readable output
All in under 20 lines.
Here’s the complete script:
from collections import Counter
import re
def word_frequency(text):
words = re.findall(r'\b[a-zA-Z]+\b', text.lower())
return Counter(words)
if __name__ == "__main__":
sample_text = """
Simple systems scale.
Complex systems fail.
Keep systems simple.
"""
frequencies = word_frequency(sample_text)
for word, count in frequencies.most_common():
print(f"{word}: {count}")
Now let’s break it down.
Imports: Use the Right Tools
from collections import Counter
import re
Two standard library modules do the heavy lifting:
rehandles pattern-based word extraction.Counterhandles counting logic.
Notice what’s not here: no manual loops to tally words. No custom sorting logic. This script leans on built-in, optimized tools instead of reinventing wheels.
That’s already a design decision.
The Function: Separation of Concerns
def word_frequency(text):
The counting logic lives inside a function. That matters.
It keeps:
- Input processing separate from output formatting
- Core logic reusable
- The script importable as a module
This is not just stylistic — it’s structural discipline.
Normalization + Extraction in One Line
words = re.findall(r'\b[a-zA-Z]+\b', text.lower())
This line does two things:
1. text.lower()
Python is case-sensitive. Without normalization:
- "Simple"
- "simple"
would be counted separately.
Lowercasing ensures consistent comparison.
2. re.findall(...)
The regex pattern:
\b[a-zA-Z]+\b
Breaks down like this:
\b→ word boundary[a-zA-Z]→ alphabetic characters only+→ one or more
This extracts clean word tokens while ignoring punctuation.
Instead of splitting on spaces and manually stripping punctuation, the script uses a pattern-based filter — a small but meaningful improvement in precision.
Counting with Counter
return Counter(words)
Counter is essentially a specialized dictionary.
Input:
["simple", "systems", "scale", "complex", "systems", "fail"]
Output:
{"systems": 2, "simple": 1, ...}
It automatically:
- Identifies unique words
- Tracks frequency
- Stores counts efficiently
Returning the Counter object instead of printing inside the function keeps the function pure — no side effects.
That makes it easier to test, extend, or reuse.
The Main Guard: Script vs Module
if __name__ == "__main__":
This is a subtle but important convention.
It allows the file to:
- Run as a standalone script
- Be imported into another file without executing the demo block
This is one of those small habits that signals maturity in Python development.
Sample Input
sample_text = """
Simple systems scale.
Complex systems fail.
Keep systems simple.
"""
The input includes:
- Capitalization
- Punctuation
- Repeated words
Which gives us something meaningful to process.
Sorting and Displaying Results
for word, count in frequencies.most_common():
print(f"{word}: {count}")
most_common() sorts words by descending frequency.
Instead of manually sorting dictionary items, we rely on built-in ranking.
The f-string keeps output clean and readable.
Example output:
systems: 3
simple: 2
scale: 1
complex: 1
fail: 1
keep: 1
Simple. Deterministic. Clear.
What This Script Quietly Demonstrates
Even though it’s small, this program models several best practices:
- Normalize before comparing
- Isolate logic inside functions
- Return data, don’t print inside core logic
- Use built-in libraries whenever possible
- Separate execution code from reusable code
- Keep transformation pipelines tight
It also raises natural extension questions:
- What about Unicode words?
- What about hyphenated terms?
- What about performance for very large files?
- Should we stream instead of loading everything into memory?
That’s where scaling conversations begin.
Conclusion
This script counts words.
But more importantly, it shows how to:
- Transform raw input into structured data
- Use Python’s standard library effectively
- Write code that is readable and extensible
Small programs are often dismissed as beginner material.
In reality, they’re where good habits are formed.
And good habits scale.
Aaron Rose is a software engineer and technology writer at tech-reader.blog. For explainer videos and podcasts, check out Tech-Reader YouTube channel.
.jpg)

Comments
Post a Comment