NAT Gateway Timeouts — Lambda in a Private Subnet Can’t Reach the Internet
Once you validate routing, confirm NAT Gateway placement and health, enable DNS, configure SGs and NACLs properly, and adopt VPC endpoints where appropriate, you restore predictable outbound performance.
Problem
A Lambda function running inside a private VPC subnet suddenly starts timing out whenever it calls external services — S3, DynamoDB, STS, third-party APIs, anything requiring outbound access. CloudWatch shows long durations with little or no log output. The function simply hangs until the timeout expires.
This is the classic symptom of a broken NAT Gateway path.
Clarifying the Issue
A Lambda inside a private subnet cannot reach the internet directly. It relies on the following chain:
Lambda ENI → Route Table → NAT Gateway → Internet Gateway → External Service
If any part of this path is misconfigured or degraded, Lambda does not fail fast — it waits. The result is a full timeout with no helpful logs.
Common failure modes include:
- Missing or incorrect
0.0.0.0/0route - Route table pointing to an unhealthy or unreachable NAT Gateway
- NAT Gateway deployed in a private subnet (incorrect)
- NAT Gateway in a different AZ (allowed, but discouraged due to cross-AZ cost/latency)
- NAT Gateway failures or throttling under load
- DNS resolution disabled at VPC level
- Security Group egress rules too restrictive
- NACLs blocking ephemeral return traffic
- Incorrect or conflicting VPC endpoint configurations
From Lambda’s perspective, everything looks normal — until the request hangs.
Why It Matters
A failing NAT Gateway effectively isolates your Lambda from most AWS APIs and all external services. This causes:
- API stalls and cascading delays
- SQS/SNS processing failures
- STS credential errors
- Broken authentication flows
- Hanging ETL pipelines
- Costly retry storms
NAT reliability is foundational. If the NAT path is broken, the Lambda is blind.
Key Terms
- Private Subnet: A subnet with no route to an Internet Gateway.
- NAT Gateway: Provides outbound internet access from private subnets.
- Route Table: Defines traffic paths for a subnet.
- VPC Endpoint: Private AWS service access that bypasses NAT.
- Security Groups (SGs): Stateful virtual firewalls.
- Network ACLs (NACLs): Stateless subnet-level filters.
Steps at a Glance
- Check logs for long, silent timeouts.
- Verify route table has a correct
0.0.0.0/0→ NAT entry. - Confirm NAT Gateway is healthy and in a public subnet.
- Enable VPC DNS support.
- Validate Security Group outbound rules.
- Validate NACL rules for ephemeral ports.
- Test outbound connectivity inside Lambda.
- Replace NAT with VPC Endpoints when possible.
Detailed Steps
Step 1: Check CloudWatch for Hanging Behavior
Look for full-duration timeouts:
REPORT RequestId: ... Duration: 30000 ms Billed Duration: 30000 ms
No stack trace. No clues. Just a stall.
This strongly indicates network egress failure.
Step 2: Validate the Route Table
Your private subnet route table must contain:0.0.0.0/0 → nat-xxxxxxxxxxxxx
Check via CLI:
aws ec2 describe-route-tables \
--route-table-ids rtb-123 \
--query "RouteTables[*].Routes"
Common issues:
- Missing
0.0.0.0/0route. - Route pointing to "local" only.
- Route pointing to a NAT Gateway in a different AZ. (This works technically, but is discouraged due to cross-AZ charges, latency, and failure-domain coupling).
Best practice: Keep NAT Gateway routing within the same AZ as the Lambda’s ENI.
Step 3: Confirm NAT Gateway Health and Placement
Check NAT Gateway state:
aws ec2 describe-nat-gateways \
--nat-gateway-ids nat-123 \
--query "NatGateways[*].State"
- Valid state:
available - Problem states:
pending,failed,deleting
Critical architectural requirement:
A NAT Gateway must be deployed in a public subnet, meaning that specific subnet must have:0.0.0.0/0 → igw-xxxxxxxxxxxxxx
If the NAT Gateway is placed in a private subnet, outbound traffic can never reach the Internet Gateway — Lambda will always hang.
Step 4: Confirm DNS Support is Enabled
Lambda must resolve AWS service endpoints (e.g., sts.amazonaws.com).
Check DNS attributes:
aws ec2 describe-vpc-attribute \
--vpc-id vpc-123 \
--attribute enableDnsSupport
Both must be true:
enableDnsSupportenableDnsHostnames
If DNS is off, nothing works — not even AWS internal API calls.
Step 5: Validate Security Group Egress Rules
Security Groups are stateful — return traffic is automatically allowed if outbound is permitted.
Ensure the Lambda’s SG allows outbound HTTPS:
- Outbound Protocol: TCP
- Port: 443
- Destination: 0.0.0.0/0
If outbound port 443 is blocked, Lambda cannot talk to anything outside the VPC.
Step 6: Validate NACLs (Stateless Filters)
NACLs are stateless, meaning you must allow both outbound requests and inbound response traffic explicitly.
Required Rules for Lambda as a Client:
| Direction | Protocol | Ports | Purpose |
|---|---|---|---|
| Outbound | TCP | 443 | Traffic leaving Lambda to Internet |
| Inbound | TCP | 1024–65535 | Return traffic from Internet to Lambda |
Note: Many real outages come from NACLs allowing Outbound 443 but blocking Inbound 1024–65535 (ephemeral return ports).
Step 7: Test Outbound Connectivity Inside Lambda
Use only standard libraries to verify connectivity:
Node.js
const https = require("https");
https.get("https://aws.amazon.com", res => {
console.log("reachable:", res.statusCode);
});
Python
import urllib.request
print(urllib.request.urlopen("https://aws.amazon.com").getcode())
If this hangs → your NAT path is broken.
Step 8: Replace NAT with VPC Endpoints (Best Practice)
Many Lambdas do not need raw internet access. For AWS-hosted services, VPC Endpoints are faster, safer, and often cheaper.
Gateway Endpoints (Free):
- S3
- DynamoDB
aws ec2 create-vpc-endpoint \
--vpc-id vpc-123 \
--service-name com.amazonaws.us-east-1.s3 \
--vpc-endpoint-type Gateway \
--route-table-ids rtb-123
Interface Endpoints (PrivateLink - Billed):
- SQS, SNS, STS, Secrets Manager, EventBridge
These endpoints bypass NAT but incur hourly + data charges, so compare costs carefully.
Pro Tips
Pro Tip #1: NAT Can Become a Bottleneck at Scale
Large Lambda bursts → NAT saturation → cascading latency.
Pro Tip #2: VPC Endpoint Cost Awareness
- Gateway endpoints → free
- Interface endpoints → billed per hour + per GB Use them intentionally.
Pro Tip #3: Warm Starts Don’t Fix NAT Problems
Provisioned Concurrency can reduce init latency, but it cannot repair a broken network path.
Pro Tip #4: Monitor NAT Gateways
Watch CloudWatch metrics for PacketDropCount, ErrorPortAllocation, and BytesIn/BytesOut spikes. These correlate directly with Lambda stalls.
Conclusion
NAT Gateway timeouts are one of the most common — and most frustrating — Lambda VPC failures. The function is healthy; the network path is not. Once you validate routing, confirm NAT Gateway placement and health, enable DNS, configure SGs and NACLs properly, and adopt VPC endpoints where appropriate, you restore predictable outbound performance.
This is disciplined VPC-aware Lambda engineering — building stable, resilient serverless applications on top of a clear, intentional network design.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.


Comments
Post a Comment