The Secret Life of Go: Worker Pools

How to Stop Crashing Your Server with 10,000 Goroutines

#Golang #Concurrency # WorkerPools #SystemDesign

Part 26: Controlling Concurrency and The Dispatcher

Ethan was staring at a wall of red text on his monitor. "Out of memory," he muttered. "How? Go is supposed to be incredibly efficient."

Eleanor walked by with her coffee. "What did you crash this time?"

"The image processing service," Ethan sighed. "We had a backlog of ten thousand images to resize. Since goroutines are lightweight, I just launched one for every single image so they would all process at the same time."

He showed her the code:

func ProcessAll(images []string) {
    var wg sync.WaitGroup
    for _, img := range images {
        wg.Add(1)
        go func(imageName string) {
            defer wg.Done()
            resizeImage(imageName) // Opens the file, decodes, resizes, saves
        }(img)
    }
    wg.Wait()
}

"You are right that goroutines are cheap," Eleanor said. "They only take a couple of kilobytes of memory to start. But the work they do is not cheap. You just told your operating system to open ten thousand files, allocate ten thousand image buffers in RAM, and hammer the CPU ten thousand times simultaneously. You didn't just write a program, Ethan. You wrote a denial-of-service attack against your own server."

"So I can't use concurrency?"

"You must use bounded concurrency," Eleanor corrected. "You need a system where you have ten thousand jobs, but only a strict, fixed number of workers executing them. We call this a Worker Pool."

The Worker

"First," Eleanor said, "we need to define what a worker looks like. It is a function that listens to a channel of jobs, does the work, and sends the outcome to a results channel."

She typed out the worker function:

func worker(id int, jobs <-chan string, results chan<- string) {
    // The worker constantly pulls from the jobs channel 
    // until the channel is closed.
    for imagePath := range jobs {
        fmt.Printf("Worker %d processing %s\n", id, imagePath)
        
        // Simulate the heavy lifting
        time.Sleep(100 * time.Millisecond) 
        
        results <- imagePath + "-resized.png"
    }
}

"Notice the range loop," Eleanor pointed out. "This worker will stay alive and keep pulling jobs out of the channel one by one. When the jobs channel is finally closed and empty, the loop ends, and the worker goroutine gracefully exits."

The Dispatcher

"Now we need the dispatcher," Eleanor continued. "This is the main function that creates the channels, hires the workers, and hands out the tasks."

She refactored his original function.

func ProcessAll(images []string) {
    numWorkers := 5
    numJobs := len(images)

    // 1. Create channels for jobs and results
    // Buffering prevents the main goroutine from blocking while feeding tasks
    jobs := make(chan string, numJobs)
    results := make(chan string, numJobs)

    // 2. Start the workers (Bounded Concurrency)
    // We only spawn 5 goroutines, no matter how many images there are.
    for w := 1; w <= numWorkers; w++ {
        go worker(w, jobs, results)
    }

    // 3. Send all the jobs to the channel
    for _, img := range images {
        jobs <- img
    }
    // Close the channel so workers know when to stop
    close(jobs)

    // 4. Collect all the results
    for a := 1; a <= numJobs; a++ {
        <-results
    }
    
    fmt.Println("All images processed successfully.")
}

Ethan read through the steps. "So all ten thousand image paths go into the jobs channel immediately. But because there are only five worker goroutines running, only five images are ever being opened and resized at the exact same time."

"Precisely," Eleanor said. "As soon as Worker 1 finishes an image, it immediately loops around and grabs the next path from the jobs channel."

"How do I know how many workers to use?" Ethan asked.

"It depends on your bottleneck," Eleanor replied. "If the work is purely mathematical, you typically match the number of CPU cores. If the work involves waiting on a network or a hard drive, you might run dozens or hundreds of workers, depending on what your external systems can handle."

Ethan ran the new code. The CPU usage stayed at a steady, healthy hum. The memory graph was a perfectly flat line. There were no crashes, and the backlog of images was chewed through systematically.

"It's like a grocery store," Ethan observed. "You might have a hundred people ready to check out, but if you only have five cashiers open, you form a queue. Opening a hundred registers at once would burn the store down."

"Exactly," Eleanor smiled. "You have moved from just making things run concurrently to controlling how they run concurrently. Your application is now predictable, stable, and production-ready."

Key Concepts

Unbounded vs. Bounded Concurrency

Unbounded: Launching a goroutine for every single task (go process(item) in a loop). This risks resource exhaustion (Out Of Memory, too many open files) on large datasets.
Bounded: Using a fixed number of goroutines to process an arbitrary number of tasks.

The Worker Pool Pattern

A standard Go architecture that combines goroutines and channels to throttle workloads.

The Jobs Channel: A channel holding the tasks to be done.
The Results Channel: A channel to collect the outputs (and ensure the main function waits for completion).
The Workers: A fixed number of goroutines running a for item := range jobs loop.
The Dispatcher: The function that spins up the workers, fills the jobs channel, closes it, and collects the results.

Production Considerations

Worker Count: Match CPU cores for computational tasks; scale higher for I/O-bound tasks (network/disk).
Error Handling: In real systems, the results channel should pass a struct containing both the data and an error interface.
Cancellation: Always pass a context.Context (from Episode 22) into your workers so they can be gracefully shut down during a system exit.

Next Episode: The Mutex. Ethan learns that sometimes, you really do just need to lock the memory.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Search This Blog

Tech-Reader.blog