A Tiny Python Script That Teaches Big Engineering Lessons

 

A Tiny Python Script That Teaches Big Engineering Lessons

A 15-line Python script that demonstrates normalization, separation of concerns, and the power of the standard library.

#Python #CodeReview #CleanCode #CodeQuality






🎧 Audio Edition: Prefer to listen? Check out the expanded AI podcast version of this deep dive on YouTube.

📺 Video Edition: Prefer to watch? Check out the 7-minute visual explainer on YouTube.


Most code examples online are either trivial or bloated.

This one sits in the middle.

It’s short. Runnable. Clear. But it quietly demonstrates several principles that matter far beyond word counting: normalization, separation of concerns, reuse of standard libraries, and structured output.

Let’s look at a small Python program that counts word frequency in a block of text — and unpack what it’s actually doing.


The Goal

Take a raw block of text and transform it into:

  • Clean, normalized words
  • A frequency tally
  • A sorted, readable output

All in under 20 lines.

Here’s the complete script:

from collections import Counter
import re

def word_frequency(text):
    words = re.findall(r'\b[a-zA-Z]+\b', text.lower())
    return Counter(words)

if __name__ == "__main__":
    sample_text = """
    Simple systems scale.
    Complex systems fail.
    Keep systems simple.
    """
    frequencies = word_frequency(sample_text)
    for word, count in frequencies.most_common():
        print(f"{word}: {count}")

Now let’s break it down.


Imports: Use the Right Tools

from collections import Counter
import re

Two standard library modules do the heavy lifting:

  • re handles pattern-based word extraction.
  • Counter handles counting logic.

Notice what’s not here: no manual loops to tally words. No custom sorting logic. This script leans on built-in, optimized tools instead of reinventing wheels.

That’s already a design decision.


The Function: Separation of Concerns

def word_frequency(text):

The counting logic lives inside a function. That matters.

It keeps:

  • Input processing separate from output formatting
  • Core logic reusable
  • The script importable as a module

This is not just stylistic — it’s structural discipline.


Normalization + Extraction in One Line

words = re.findall(r'\b[a-zA-Z]+\b', text.lower())

This line does two things:

1. text.lower()

Python is case-sensitive. Without normalization:

  • "Simple"
  • "simple"

would be counted separately.

Lowercasing ensures consistent comparison.

2. re.findall(...)

The regex pattern:

\b[a-zA-Z]+\b

Breaks down like this:

  • \b → word boundary
  • [a-zA-Z] → alphabetic characters only
  • + → one or more

This extracts clean word tokens while ignoring punctuation.

Instead of splitting on spaces and manually stripping punctuation, the script uses a pattern-based filter — a small but meaningful improvement in precision.


Counting with Counter

return Counter(words)

Counter is essentially a specialized dictionary.

Input:

["simple", "systems", "scale", "complex", "systems", "fail"]

Output:

{"systems": 2, "simple": 1, ...}

It automatically:

  • Identifies unique words
  • Tracks frequency
  • Stores counts efficiently

Returning the Counter object instead of printing inside the function keeps the function pure — no side effects.

That makes it easier to test, extend, or reuse.


The Main Guard: Script vs Module

if __name__ == "__main__":

This is a subtle but important convention.

It allows the file to:

  • Run as a standalone script
  • Be imported into another file without executing the demo block

This is one of those small habits that signals maturity in Python development.


Sample Input

sample_text = """
Simple systems scale.
Complex systems fail.
Keep systems simple.
"""

The input includes:

  • Capitalization
  • Punctuation
  • Repeated words

Which gives us something meaningful to process.


Sorting and Displaying Results

for word, count in frequencies.most_common():
    print(f"{word}: {count}")

most_common() sorts words by descending frequency.

Instead of manually sorting dictionary items, we rely on built-in ranking.

The f-string keeps output clean and readable.

Example output:

systems: 3
simple: 2
scale: 1
complex: 1
fail: 1
keep: 1

Simple. Deterministic. Clear.


What This Script Quietly Demonstrates

Even though it’s small, this program models several best practices:

  • Normalize before comparing
  • Isolate logic inside functions
  • Return data, don’t print inside core logic
  • Use built-in libraries whenever possible
  • Separate execution code from reusable code
  • Keep transformation pipelines tight

It also raises natural extension questions:

  • What about Unicode words?
  • What about hyphenated terms?
  • What about performance for very large files?
  • Should we stream instead of loading everything into memory?

That’s where scaling conversations begin.


Conclusion

This script counts words.

But more importantly, it shows how to:

  • Transform raw input into structured data
  • Use Python’s standard library effectively
  • Write code that is readable and extensible

Small programs are often dismissed as beginner material.

In reality, they’re where good habits are formed.

And good habits scale.


Aaron Rose is a software engineer and technology writer at tech-reader.blog. For explainer videos and podcasts, check out Tech-Reader YouTube channel.

Comments

Popular posts from this blog

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison