The Secret Life of AWS: The Occupancy Limit (Service Quotas)
When "infinity" runs out. Understanding AWS service quotas and limits.
Part 30 of The Secret Life of AWS
Timothy was glowing. His "Assembly Line" (Episode 29) was humming perfectly. He had just pushed a new marketing campaign, and traffic to his application was spiking beautifully.
"Look at those numbers, Margaret!" he cheered, pointing to his dashboard. "1,000 concurrent users. 2,000! We are scaling!"
Suddenly, the dashboard turned red.
Error: Rate Exceeded.
Error: LimitExceededException.
Timothy frantically typed into his terminal. "Is the database down? Did the code break?"
He tried to deploy a hotfix using his new pipeline.
Deployment Failed: Reached the maximum number of Stacks allowed.
"What is happening?" Timothy cried. "I thought the Cloud was infinite! I thought it scaled automatically!"
Margaret walked over, calm as ever. She looked at the error message.
"The Cloud is infinite, Timothy," she said. "But your account is not."
"You have hit the Occupancy Limit."
"We need to talk about Service Quotas."
Service Quotas
Margaret opened the Service Quotas console (formerly known as Service Limits).
"Think of AWS like a skyscraper with infinite floors," Margaret explained. "But your account only has keys to the first five floors. The Fire Marshal sets a limit on how many people can be in one room, and the landlord sets a limit on how many rooms you can rent."
"Why?" Timothy asked, frustrated. "I want to pay them money!"
"Two reasons," Margaret said.
- Physical Capacity: To protect the AWS data centers from being overwhelmed.
- Wallet Protection: "If you wrote a bug that accidentally spun up 10,000 expensive servers, this quota would stop you before you owed Amazon a million dollars. It is a safety net."
She pointed to the dashboard.
- Concurrent Lambda Executions: 1,000
- VPCs per Region: 5
- CloudFormation Stacks: 200
"You hit the Lambda concurrency limit," she noted. "You tried to run 1,001 functions at once, and AWS blocked the 1,001st person at the door."
Soft Quotas vs. Hard Quotas
"So I'm stuck?" Timothy asked, defeated. "I can't grow any bigger?"
"Not at all," Margaret smiled. "We just need to ask for a permit."
She clicked on the Lambda Concurrent Executions quota. She clicked Request quota increase.
She typed in 5,000 and hit Send.
"That is a Soft Quota," she explained. "AWS sets it low by default to keep you safe. But if you have a valid reason, they will raise it. Usually within a few minutes or hours."
"Are all limits like that?" Timothy asked.
"No," Margaret warned. "Some are Hard Quotas."
"For example," she said, "A Lambda function can only run for 15 minutes. A DynamoDB item cannot be larger than 400KB. An SQS message cannot be larger than 256KB."
"You cannot ask AWS to change those," she said firmly. "If you hit a Hard Quota, you don't need a support ticket. You need a Re-architecture."
CloudWatch Alarms
Timothy looked at the console. "But how was I supposed to know? The error just happened out of nowhere."
"That is why we don't fly blind," Margaret said.
She navigated to CloudWatch Alarms. She set up a new alarm:
- Metric:
ServiceQuota: ConcurrentExecutions - Threshold: 80%
"Now," she said, "when your party gets to 80% capacity, you will get an email. That gives you time to ask for a limit increase before the customers are blocked at the door."
Timothy nodded. He had thought the cloud was magic. Now he realized it was just a very large, very well-managed building.
"I'll request the increases now," Timothy said, "before the Black Friday sale."
"Good plan," Margaret smiled. "In the Cloud, the sky is the limit. But you still have to build the staircase."
Key Concepts
- Service Quotas: The maximum number of resources (e.g., buckets, instances) or performance (e.g., throughput) available in your AWS account.
- Soft Quotas (Adjustable): Limits that can be increased by opening a request with AWS Support (e.g., Concurrent Lambda Executions). These also act as a "budget safety net" against accidental over-provisioning.
- Hard Quotas (Fixed): Architecture constraints that cannot be changed (e.g., Lambda 15-minute timeout). Hitting these usually requires changing your design.
- Quota Monitoring: Using CloudWatch to alert you when you are approaching a limit.
- Best Practice: Document your critical quotas and review them quarterly as part of your capacity planning.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.


Comments
Post a Comment