AWS Lambda Error: “Rate Exceeded” — When Concurrency and Memory Collide

Understanding how AWS Lambda memory, CPU scaling, and concurrency limits trigger the dreaded “Rate Exceeded” error

Problem

Your Lambda function starts throwing this message in CloudWatch logs:

"TooManyRequestsException: Rate Exceeded"

At first glance, it looks like an API rate limit. But in Lambda, it often means your concurrency quota has been reached. The function is trying to scale up in response to a burst of invocations, but AWS is refusing to allocate additional concurrent executions.

In some cases, you’ll also see:

"Task timed out after 30.00 seconds"

That usually points to memory misconfiguration — the function is taking too long to finish, tying up concurrent execution slots even longer. The combination of slow execution and exhausted concurrency is what triggers the dreaded “Rate Exceeded.”

Clarifying the Issue

Concurrency in Lambda determines how many function instances can run at once. Every AWS account has an account-wide soft limit (typically 1,000 concurrent executions by default). When that cap is reached, Lambda throttles new invocations.

Memory settings directly influence how fast your function runs. In Lambda, CPU scales linearly with memory — up to 10 GB. That means more memory doesn’t just give you more RAM; it also gives you more processing power. Under-provisioned memory makes code run slower, keeping concurrency slots occupied longer.

This creates a feedback loop:

Slow code → higher concurrency usage → throttling → “Rate Exceeded”

Many teams tune concurrency without realizing their memory setting is the real throttle.

Why It Matters

Performance vs. Cost: Lower memory seems cheaper, but longer run times drive total cost higher.
Throttling Impacts Reliability: When concurrency maxes out, new invocations fail or stall.
Retries Multiply Cold Starts: AWS retries throttled invocations, creating new containers and worsening latency.
Cross-Function Impact: One inefficient Lambda can consume the entire account quota, starving others.

It’s not just a tuning exercise — it’s a balancing act between throughput, efficiency, and cost control.

Key Terms

Reserved Concurrency: Caps how many concurrent instances a single function can use.
Account-Level Concurrency Limit: The total concurrent executions allowed for your account.
Burst Concurrency: AWS’s regional burst allowance before throttling begins.
Throttled Invocation: A blocked execution due to lack of concurrency slots.
Memory Allocation: Determines both RAM and CPU power; doubling memory roughly doubles CPU performance (up to the 10 GB ceiling).

Steps at a Glance

Diagnose the Error
Check Concurrency Metrics
Triage Which Functions Are Hogging Concurrency
Right-Size Memory
Set Reserved or Provisioned Concurrency
Request a Limit Increase
Load Test Your Fix

Detailed Steps

1. Diagnose the Error

Go to CloudWatch Logs for the affected function.

Look for one of these messages:

TooManyRequestsException: Rate Exceeded

Task timed out after 30.00 seconds

These indicate either concurrency exhaustion or slow execution due to under-provisioned memory.

2. Check Concurrency Metrics

In the AWS Console, go to:

Lambda → Monitor → Concurrent Executions

Or view the CloudWatch metric:

AWS/Lambda → ConcurrentExecutions

If the line hovers near your quota, you’re hitting concurrency saturation.

3. Triage Which Functions Are Hogging Concurrency

In CloudWatch or Lambda Insights, sort by ConcurrentExecutions.

Identify the functions with high invocation counts or long durations.

Regional Burst Concurrency

AWS allows short bursts of concurrent Lambda executions before throttling, but the burst size depends on region: up to 500 in smaller regions (for example, ap-southeast-2), and up to 3,000 in large regions (for example, us-east-1, us-west-2). After the burst is consumed, Lambda ramps concurrency at approximately 500 executions per minute until reaching the account limit. A sudden traffic spike in a low-burst region (like Sydney or São Paulo) may trigger “Rate Exceeded” much sooner than the same workload in N. Virginia. Always load test in the same region you plan to run production workloads, because scaling behavior is not global.

4. Right-Size Memory

Increase the memory allocation gradually (for example, from 256 MB → 512 MB → 1024 MB).

Because CPU scales with memory, the function finishes faster and releases concurrency slots sooner.

Pro Tip:

Doubling memory often cuts execution time by 60–70%. Even though you pay a higher rate per ms, the total cost per request frequently decreases.

5. Set Reserved or Provisioned Concurrency

Reserved Concurrency: Guarantees a minimum number of slots for critical functions.
Provisioned Concurrency: Pre-warms containers to avoid cold starts under sudden load.

Configure both in the Lambda Console → Configuration → Concurrency.

6. Request a Limit Increase

Navigate to:

Service Quotas → AWS Lambda → Concurrent Executions

Submit a quota increase request if your workload regularly approaches the limit.

7. Load Test Your Fix

Use artillery, ab, or wrk to simulate realistic traffic.

Verify:

No new "Rate Exceeded" messages
Improved average duration
Fewer cold starts under load

Conclusion

When Lambda throws “Rate Exceeded,” it’s not scolding you for API abuse — it’s signaling that your compute configuration is out of balance.

Memory and concurrency are deeply intertwined levers. Underpowered Lambdas slow down, occupy concurrency longer, and starve other functions. The fix isn’t always a bigger limit — sometimes it’s just smarter tuning.

By right-sizing memory, defining reserved concurrency, and testing burst behavior by region, you transform throttling from a mystery into a managed performance variable.

In short: “Rate Exceeded” is your Lambda’s way of gasping for air. Give it the headroom and horsepower it deserves.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Search This Blog

Tech-Reader.blog