AWS SQS Error: SQS Dead-Letter Queue (DLQ) Redrive Misconfigured
When messages never reach a Dead-Letter Queue, SQS isn’t broken. It’s doing exactly what it was told.
Problem
You’ve configured a Dead-Letter Queue (DLQ) for Amazon SQS.
Messages fail processing repeatedly.
But nothing ever shows up in the DLQ.
No errors.
No warnings.
Just a growing sense that SQS is ignoring your configuration.
Clarifying the Issue
This is almost never a bug in SQS.
It is almost always a redrive failure caused by one of three things:
❌ The redrive policy is incomplete or incorrect
❌ Messages are being deleted before SQS can redrive them
❌ SQS is silently unable to write to the DLQ (permissions or encryption)
SQS only moves messages to a DLQ under very strict conditions.
If any requirement is unmet, messages will retry — or be dropped — without ceremony.
Why It Matters
DLQs are meant to be your last line of defense:
- Capture poison messages
- Preserve failed data
- Stop infinite retry loops
- Enable debugging and reprocessing
When DLQs don’t work:
- Bad messages churn invisibly
- Lambda costs rise
- Downstream systems degrade
- Failures become harder to diagnose
A broken DLQ is worse than no DLQ — because it creates false confidence.
Key Terms
- Dead-Letter Queue (DLQ) – A queue that receives messages after repeated failures
- Redrive Policy – Configuration linking a source queue to a DLQ
- maxReceiveCount – Number of receives before SQS moves a message to the DLQ
- Receive Count – How many times a message has been delivered
- Explicit Delete – Consumer deletes a message, ending its lifecycle
- Silent Drop – Message discarded because SQS cannot redrive it
Steps at a Glance
- Confirm the redrive policy and DLQ permissions
- Verify
maxReceiveCountis understood correctly - Ensure failed messages are not deleted
- Understand what increments receive count
- Validate consumer behavior (especially Lambda)
Detailed Steps
Step 1: Confirm Redrive Policy, Permissions, and Encryption
DLQs are configured on the source queue, not the DLQ itself.
Creating a DLQ alone does nothing.
Linking it incorrectly does nothing.
Even worse: if SQS cannot write to the DLQ, messages are dropped silently after retries.
Action
- Link it: Source Queue → Redrive Policy → Select DLQ
- Permit it: Ensure the DLQ Access Policy allows
sqs:SendMessagefrom the source queue (or account) Decrypt it: If the DLQ uses KMS, ensure SQS has:
kms:GenerateDataKeykms:Decrypt
❌ If permissions are missing, SQS will drop messages silently
This is one of the most dangerous DLQ failure modes because nothing errors loudly.
Step 2: Verify maxReceiveCount
This is the most misunderstood setting.
maxReceiveCount means:
“After this many receives, move the message to the DLQ.”
It does not mean:
- Processing attempts
- Lambda retries
- Visibility timeout expirations
❌ Setting it to 1 and expecting instant DLQ
❌ Setting it too high and assuming DLQ is broken
✅ Typical values: 3–5
Action
- Set
maxReceiveCountdeliberately - Align it with how many retries you actually want
Step 3: Ensure Failed Messages Are NOT Deleted
In SQS, deletion is final.
If your consumer deletes a message — even after failure — SQS considers it successfully processed and will never redrive it.
Common traps:
finally { deleteMessage() }- Catching exceptions and returning success
- Lambda handlers swallowing errors
❌ Delete on failure
✅ Let the message become visible again
Action
- Audit delete logic carefully
- Delete only after true success
Step 4: Understand What Increments Receive Count
A receive happens when:
- A consumer receives the message
- The visibility timeout begins
It increments even if:
- Processing never starts
- Lambda times out
- The consumer crashes
This means:
- Visibility timeout + retries drive DLQ behavior
- Fast failures can hit
maxReceiveCountquickly
Action
- Ensure visibility timeout matches processing time
- Expect receive count to increase even on crashes
Step 5: Validate Lambda-Specific Behavior
When using Lambda with SQS:
❌ Returning success tells SQS the entire batch succeeded
❌ One failure can requeue every message in the batch
✅ Enable partial batch failure handling (ReportBatchItemFailures)
✅ Let only failed messages retry
If Lambda never signals failure, the DLQ will never trigger.
Pro Tips
- DLQs only work if messages are allowed to fail
- Deleting a message is irreversible
maxReceiveCountcounts receives, not errors- Test DLQs intentionally with bad payloads
- Queue types must match: FIFO → FIFO, Standard → Standard
Conclusion
When messages never reach a Dead-Letter Queue, SQS isn’t broken.
It’s doing exactly what it was told.
DLQs are precise tools. They only activate when:
- A valid redrive policy exists
- Permissions and encryption allow redrive
- Receives exceed the threshold
- Messages are not deleted prematurely
Once those conditions are aligned, DLQs work predictably — and failures become visible again.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
.jpeg)

Comments
Post a Comment