The Secret Life of AWS: The Queue (Amazon SQS)

 

The Secret Life of AWS: The Queue (Amazon SQS)

Decoupling the monolith. How to use Amazon SQS to handle high traffic and prevent system failures.





Part 19 of The Secret Life of AWS

Timothy watched the logs on his screen. It was the first day of the "Big Sale," and traffic was higher than usual.

Suddenly, the error rate spiked. HTTP 504: Gateway Timeout.

"The checkout is broken!" Timothy yelled. "The server is timing out."

He frantically checked his dashboards (Ep 17). The CPU was fine. The Database was fine.

"Why is it timing out?" he asked.

Margaret pulled up the trace (Ep 17). "Look at the bottleneck. It’s the 'Email Receipt' step. The external email provider is throttled. Because your code waits for the email to be sent before confirming the order, the browser hangs."

"I can't fix the email provider," Timothy said.

"No," Margaret agreed. "But you can fix your architecture. You are using Synchronous Processing. The user has to wait for every single step to finish."

"We need to decouple the checkout from the email," she said. "We need Amazon Simple Queue Service (SQS)."

The Buffer

Margaret navigated to the Amazon SQS console.

"SQS is a message queue," she explained. "It allows components to communicate asynchronously. Instead of 'Order Service' telling 'Email Service' to do something now, it drops a message in the queue."

"The checkout process changes," she continued. "The user clicks 'Buy'. Your API saves the order, drops a message in the queue saying 'Send Email', and immediately returns 'Success' to the user."

"So the user doesn't wait for the email?" Timothy asked.

"Correct. The email happens milliseconds later, in the background. If the email provider is slow, the queue just fills up. The user never notices."

Standard vs. FIFO

Margaret clicked Create Queue. She pointed to the two options: Standard and FIFO.

"Standard Queues are the default," she said. "They support nearly unlimited throughput—thousands of messages per second. However, rarely, a message might be delivered twice, or out of order."

"FIFO (First-In-First-Out) Queues guarantee that messages are processed in the exact order they arrived, and exactly once," she explained. "However, they have lower throughput limits compared to Standard queues."

"We use FIFO for banking transactions where order is critical," she noted. "For email receipts, Standard is sufficient."

She selected Standard.

The Consumer (Lambda)

"Now we have a queue full of messages," Timothy said. "Who reads them?"

"We need a Consumer," Margaret said. "We will configure an AWS Lambda function to poll the queue."

"This creates a Competing Consumers pattern," she explained. "If ten messages arrive, AWS spins up one Lambda. If ten thousand messages arrive, AWS spins up hundreds of Lambdas to read them in parallel. It scales automatically."

"What if the email provider is down?" Timothy asked.

"That is the beauty of SQS," Margaret said. "Messages persist in the queue for days (configurable up to 14 days). If the Lambda function fails, the message becomes visible again after a Visibility Timeout. Another Lambda instance will try to process it later. No data is lost."

The Dead Letter Queue (DLQ)

"But what if it never succeeds?" Timothy asked. "What if the email address is invalid? Does it loop forever?"

"That would cost money," Margaret noted (remembering Ep 18). "We configure a Dead Letter Queue (DLQ)."

She pointed to the Redrive Policy settings. "We tell SQS: 'If this message fails 5 times, move it to the DLQ'. The DLQ is a separate queue where bad messages go to die. You can inspect them later to see what went wrong, without clogging up the main system."

The Result

Timothy refactored the code.

Old Flow (Synchronous):
User -> API -> Database -> Email Server (Wait) -> Response.

New Flow (Asynchronous):
User -> API -> SQS -> Response.
(Background: SQS -> Lambda -> Email Server)

He deployed the change. The timeouts disappeared instantly. The checkout felt instant, even though the email provider was still slow.

"This is Event-Driven Architecture," Margaret said. "You have decoupled the components. The failure of one dependency no longer breaks the entire system."


Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison