DNS Resolution Failures — Lambda Cannot Resolve Hostnames in a VPC

 

DNS Resolution Failures — Lambda Cannot Resolve Hostnames in a VPC

DNS failures inside Lambda VPC configurations are one of the most deceptive networking outages. They masquerade as NAT problems, IAM failures, and random timeouts.





Problem

A Lambda function running inside a VPC suddenly begins failing with:

  • long hangs followed by timeouts
  • UnknownHostException
  • gaierror: [Errno -2] Name or service not known
  • “Could not resolve host”
  • SDK retry storms that never complete

CloudWatch shows almost no useful logs.
Your NAT Gateway is healthy.
Your route tables look correct.
But the function still cannot reach:

  • s3.amazonaws.com
  • sts.amazonaws.com
  • api.example.com
  • any AWS or external API that requires DNS resolution

This is the classic signature of a DNS failure inside a Lambda VPC configuration.


Clarifying the Issue

Every Lambda running inside a VPC relies on AmazonProvidedDNS, AWS’s internal Route 53 Resolver service.
If DNS is disabledoverridden, or blocked, Lambda cannot translate hostnames into IP addresses — and all outbound connectivity collapses.

Common root causes include:

  • enableDnsSupport = false
  • enableDnsHostnames = false
  • Custom DHCP Options replacing AmazonProvidedDNS
  • NACLs blocking ephemeral return traffic
  • SGs blocking outbound DNS queries
  • Incorrect Route 53 Resolver configuration
  • Private DNS overrides created by Interface Endpoints
  • Missing Private Hosted Zone associations
  • VPN/Direct Connect hybrid DNS inconsistencies

From Lambda’s perspective, this is not “network unreachable.”
It is: “I don’t know where to send this request.”


Why It Matters

DNS outages are brutal because they mimic:

  • NAT failures
  • VPC endpoint failures
  • IAM failures
  • Third-party API failures
  • Random timeouts

Meanwhile the root cause is simply that hostnames cannot resolve.

A single DNS misconfiguration can break:

  • AWS SDK calls (STS, S3, DynamoDB, SNS, SQS…)
  • Internal microservices
  • Third-party integrations
  • Any HTTPS request

DNS is foundational — when it breaks, everything breaks.


Key Terms

AmazonProvidedDNS — The default DNS resolver for AWS VPCs.
Route 53 Resolver — Managed DNS system for VPC and hybrid networks.
DHCP Options Set — Controls the DNS servers your VPC uses.
Interface Endpoint (PrivateLink) — Creates private DNS overrides for AWS APIs.
Network ACLs (NACLs) — Stateless packet filters requiring explicit return traffic rules.


Steps at a Glance

  1. Verify VPC DNS attributes (enableDnsSupportenableDnsHostnames).
  2. Test DNS resolution inside Lambda with code.
  3. Check DHCP Options Set for dangerous overrides.
  4. Validate NACL inbound/outbound DNS rules.
  5. Confirm SG outbound DNS rules (rare, but possible).
  6. Validate Route 53 Resolver endpoints (hybrid).
  7. Review Interface Endpoint DNS overrides.
  8. Test DNS resolution using EC2/Cloud9 in the same subnet.

Detailed Steps

Step 1: Verify DNS Support Is Enabled

DNS resolution breaks instantly if either attribute is disabled.

aws ec2 describe-vpc-attribute \
  --vpc-id vpc-123 \
  --attribute enableDnsSupport

# Look for: "Value": true
aws ec2 describe-vpc-attribute \
  --vpc-id vpc-123 \
  --attribute enableDnsHostnames

# Look for: "Value": true

Both values must be true.

  • enableDnsSupport = false → no resolver
  • enableDnsHostnames = false → no hostname-to-IP mapping

This single misconfiguration takes down all AWS API access.


Step 2: Test DNS Resolution Inside Lambda (Code Only)

Lambda runtimes do not include commands like nslookupdig, or ping.
Use code-based lookups instead.

Node.js

const dns = require("dns");

exports.handler = async () => {
  dns.lookup("aws.amazon.com", (err, address) => {
    console.log("DNS:", err || address);
  });
};

Python

import socket

def lambda_handler(event, context):
    print(socket.gethostbyname("aws.amazon.com"))

If these hang or throw → DNS is failing at the VPC layer.


Step 3: Validate DHCP Options Set (Critical!)

A custom DHCP Options Set can override AmazonProvidedDNS — breaking Lambda instantly.

Check the settings:

aws ec2 describe-dhcp-options \
  --dhcp-options-ids dopt-123

If you see:

DomainNameServers = 10.0.1.5

instead of:

DomainNameServers = AmazonProvidedDNS

Then Lambda will NOT use the AWS internal resolver.

This causes:

  • STS failures
  • S3/DynamoDB failures
  • PrivateLink failures
  • Any AWS SDK call failure

Unless you require hybrid DNS, stick with AmazonProvidedDNS.


Step 4: Validate NACL Rules (Stateless Traffic)

DNS uses:

  • UDP 53
  • TCP 53

Lambda, as a client, sends DNS queries outbound on port 53 and receives replies on ephemeral ports.

Required Rules:

DirectionProtocolPortsPurpose
OutboundUDP/TCP53DNS lookup requests
InboundUDP/TCP1024–65535DNS response traffic

Important:
If inbound ephemeral ports are blocked, DNS silently fails.


Step 5: Validate Security Group DNS Rules

Security Groups are stateful, so they automatically allow return traffic.

Just ensure outbound DNS is allowed:

Outbound: TCP 53, UDP 53 → 0.0.0.0/0

SG issues are less common, but worth confirming.


Step 6: Confirm Route 53 Resolver Endpoints (Hybrid DNS)

Hybrid networks introduce complexity:

  • Outbound resolver endpoints must be reachable
  • Inbound endpoints must accept your queries
  • Subnets hosting endpoints must allow port 53
  • NACLs must not block DNS traffic
  • Peering must route DNS queries correctly

If a Resolver endpoint is misconfigured, Lambda will time out on DNS even if NAT works perfectly.


Step 7: Review VPC Endpoint DNS Overrides

Interface Endpoints (PrivateLink) often create private DNS entries that override public ones.

Example:
Creating an STS endpoint replaces:

sts.amazonaws.com → private IP in your VPC

For Private DNS to function:

  • enableDnsSupport = true
  • enableDnsHostnames = true

(Direct dependency on Step 1.)

If the endpoint’s Security Group blocks inbound 443 → DNS resolution succeeds, but connections fail.

If Private DNS is misapplied → all regional API calls break.


Step 8: Test DNS Resolution Using EC2/Cloud9 (Not Lambda)

Lambda runtimes do not include tools like nslookup — use EC2 or Cloud9 in the same subnet and SG.

Launch a temporary EC2 instance using the same subnet and same SG as the failing Lambda.
Then test:

AWS Internal DNS

nslookup s3.amazonaws.com

External DNS

nslookup example.com

Private Hosted Zones

nslookup internal.service.local

This isolates:

  • SG vs NACL issues
  • DHCP options problems
  • Private DNS override conflicts
  • VPC endpoint misconfigurations

THIS is how AWS Support debugs DNS in VPC environments.


Pro Tips

Pro Tip #1: DNS Failures Masquerade as NAT Failures

If NAT looks perfect but the function still hangs → suspect DNS.


Pro Tip #2: DNS Failure = IAM Failure

The AWS SDK must resolve STS endpoints before it can sign requests.
If DNS dies, your Lambda effectively loses identity.


Pro Tip #3: Avoid Custom DHCP Options Unless Absolutely Necessary

Custom DNS is the #1 cause of Lambda DNS outages.


Pro Tip #4: Private Hosted Zones Must Be Associated with the VPC

Otherwise only public names resolve.


Pro Tip #5: Interface Endpoint DNS Overrides Are Powerful — and Dangerous

They silently replace public DNS.
If misconfigured, they break entire service categories.


Conclusion

DNS failures inside Lambda VPC configurations are one of the most deceptive networking outages. They masquerade as NAT problems, IAM failures, and random timeouts. But the real issue is simple:
Lambda cannot resolve hostnames.

By validating:

  • DNS support
  • DHCP options
  • NACL and SG rules
  • Route 53 Resolver endpoints
  • VPC endpoint DNS behavior
  • External and internal resolution paths

…you restore predictable, stable, AWS-aligned networking behavior for all Lambda workloads.

This is disciplined, infrastructure-aware cloud engineering — understanding that reliability begins with name resolution.


Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison