The Secret Life of AWS: The Success Disaster (Amazon API Gateway Throttling)

How to protect your serverless backend from viral spikes and malicious bots using rate limiting

#AWS #APIGateway #Security #RateLimiting

Margaret is a senior software engineer. Timothy is her junior colleague. They work in a grand Victorian library in London — the kind of place where code quality is the unspoken objective, and craftsmanship is the only thing that matters.

Episode 63

Timothy was staring at his AWS Billing dashboard in absolute horror. The line graph for the current month had spiked vertically overnight. He frantically toggled over to CloudWatch.

"Someone is spamming the checkout endpoint," Timothy said, his voice laced with panic as Margaret walked into the studio. "A competitor's botnet is hitting our API thousands of times a second to scrape our inventory data. The decoupled architecture handled the load perfectly—every single request was processed. But my estimated AWS bill just tripled, and my other Lambda functions are throwing ConcurrentExecution errors."

"You have encountered the Success Disaster," Margaret said, maintaining her trademark calm. "In a serverless environment, infinite scale means infinite cost. Your architecture is incredibly resilient, but because it lacks boundaries, a malicious attack—or even a viral marketing campaign—will consume every resource available."

Timothy ran a hand through his hair. "Why are my other Lambda functions failing if the bot is only hitting the checkout endpoint?"

"Because AWS accounts share a regional pool of Lambda execution environments," Margaret explained. "The default limit is 1,000. While we can request a service quota increase from AWS Support to raise that ceiling, throwing more compute at a botnet is a fool's errand. The bot consumed all of your available execution environments. When your email service or your inventory service tried to scale up, AWS blocked them. You allowed a single endpoint to starve your entire ecosystem. We must protect your backend by closing the floodgates at the front door using Amazon API Gateway."

The Token Bucket

Margaret opened the AWS Console and navigated to API Gateway.

"Just because the cloud can scale infinitely does not mean it should," Margaret said. "We are going to implement Throttling. API Gateway uses the Token Bucket algorithm to manage traffic. Imagine a bucket that holds a maximum number of tokens. Every time a request comes in, it takes a token out of the bucket. If the bucket is empty, the request is rejected."

She navigated to the Stage settings for Timothy's API.

"We define two parameters: Rate and Burst," Margaret explained. "The Rate is how fast AWS adds new tokens back into the bucket—let us set this to 100 requests per second. The Burst is the absolute maximum size of the bucket—we will set this to 200 concurrent requests."

Usage Plans and API Keys

"That protects the overall system," Timothy observed, studying the numbers. "But what if we have a legitimate third-party partner who needs to hit our API 50 times a second? If the bot consumes all 100 requests of our Rate limit, our partner gets blocked too."

"Excellent architectural foresight," Margaret smiled. "To solve the noisy neighbor problem, we issue API Keys and assign them to Usage Plans."

Margaret created a new Usage Plan called PartnerTier and generated a unique API Key.

"We give this key to our partner, and they include it in their HTTP request headers," she explained. "We can now enforce granular, per-client throttling. We configure the PartnerTier to guarantee 50 requests per second. You can even set different rate limits for different API keys, giving premium customers higher allowances. If the partner's code goes rogue and sends 5,000 requests a second, API Gateway instantly drops the excess traffic at the edge."

"And for the botnet?" Timothy asked.

"For advanced protection, remember our lesson from Episode 52," Margaret noted. "We can combine this throttling with AWS WAF to block known malicious IPs before they even hit the rate limit. And for read-heavy endpoints, API Gateway caching can reduce backend load entirely, though that is a different layer of defense."

The Power of 429

Timothy watched the live logs as the configuration took effect. The botnet was still hammering the endpoint, but the backend Lambda invocations had completely stopped.

"Look at the HTTP response codes," Margaret pointed to the console.

Instead of routing traffic to the backend, API Gateway was acting as a shield, instantly returning HTTP 429 Too Many Requests errors to the bot.

"The traffic is dying at the edge," Timothy marveled. "The Lambda functions are not executing, EventBridge is not routing, and DynamoDB is not writing. My AWS bill just stopped bleeding."

"And because the bot is no longer monopolizing your compute environments," Margaret added, "your other microservices have their concurrency back. The ecosystem is healthy."

Timothy updated his architecture diagram. His application was no longer just highly available; it was financially and operationally protected.

Key Concepts Introduced:

The Success Disaster: A scenario in cloud computing where a system scales perfectly to handle a massive spike in traffic (viral success or a DDoS attack), but does so at an unsustainable financial cost or by starving other critical systems of shared account limits.

Lambda Concurrency Limits: AWS enforces a shared safety limit on the number of Lambda functions that can execute simultaneously in a single region (default 1,000, though requesting a quota increase is possible). If one highly active function consumes the entire concurrency pool, all other Lambda functions in the account will be throttled.

API Gateway Throttling: A defense mechanism to prevent backend services from being overwhelmed. It uses the Token Bucket algorithm, defined by a Rate (steady-state requests per second) and a Burst (maximum concurrent peak requests). Requests exceeding these limits are blocked at the API layer, returning an HTTP 429 Too Many Requests error without ever invoking backend resources.

Defense in Depth: Mature APIs combine multiple strategies: Usage Plans and API Keys to enforce per-client throttling and prevent noisy neighbors, AWS WAF to block outright malicious IP addresses before they consume rate limits, and caching to serve read-heavy traffic without invoking backend compute.

Aaron Rose is a software engineer and technology writer at tech-reader.blog.

Catch up on the latest explainer videos, podcasts, and industry discussions below.

Search This Blog

Tech-Reader.blog