AWS SQS Error: SQS Dead-Letter Queue (DLQ) Redrive Misconfigured

- January 04, 2026

AWS SQS Error: SQS Dead-Letter Queue (DLQ) Redrive Misconfigured

#aws #sqs #devops #cloud

When messages never reach a Dead-Letter Queue, SQS isn’t broken. It’s doing exactly what it was told.

Problem

You’ve configured a Dead-Letter Queue (DLQ) for Amazon SQS.

Messages fail processing repeatedly.
But nothing ever shows up in the DLQ.

No errors.
No warnings.
Just a growing sense that SQS is ignoring your configuration.

Clarifying the Issue

This is almost never a bug in SQS.

It is almost always a redrive failure caused by one of three things:

❌ The redrive policy is incomplete or incorrect
❌ Messages are being deleted before SQS can redrive them
❌ SQS is silently unable to write to the DLQ (permissions or encryption)

SQS only moves messages to a DLQ under very strict conditions.
If any requirement is unmet, messages will retry — or be dropped — without ceremony.

Why It Matters

DLQs are meant to be your last line of defense:

Capture poison messages
Preserve failed data
Stop infinite retry loops
Enable debugging and reprocessing

When DLQs don’t work:

Bad messages churn invisibly
Lambda costs rise
Downstream systems degrade
Failures become harder to diagnose

A broken DLQ is worse than no DLQ — because it creates false confidence.

Key Terms

Dead-Letter Queue (DLQ) – A queue that receives messages after repeated failures
Redrive Policy – Configuration linking a source queue to a DLQ
maxReceiveCount – Number of receives before SQS moves a message to the DLQ
Receive Count – How many times a message has been delivered
Explicit Delete – Consumer deletes a message, ending its lifecycle
Silent Drop – Message discarded because SQS cannot redrive it

Steps at a Glance

Confirm the redrive policy and DLQ permissions
Verify maxReceiveCount is understood correctly
Ensure failed messages are not deleted
Understand what increments receive count
Validate consumer behavior (especially Lambda)

Detailed Steps

Step 1: Confirm Redrive Policy, Permissions, and Encryption

DLQs are configured on the source queue, not the DLQ itself.

Creating a DLQ alone does nothing.
Linking it incorrectly does nothing.

Even worse: if SQS cannot write to the DLQ, messages are dropped silently after retries.

Action

Link it: Source Queue → Redrive Policy → Select DLQ
Permit it: Ensure the DLQ Access Policy allows sqs:SendMessage from the source queue (or account)
Decrypt it: If the DLQ uses KMS, ensure SQS has:
- kms:GenerateDataKey
- kms:Decrypt
❌ If permissions are missing, SQS will drop messages silently

This is one of the most dangerous DLQ failure modes because nothing errors loudly.

Step 2: Verify `maxReceiveCount`

This is the most misunderstood setting.

maxReceiveCount means:

“After this many receives, move the message to the DLQ.”

It does not mean:

Processing attempts
Lambda retries
Visibility timeout expirations

❌ Setting it to 1 and expecting instant DLQ
❌ Setting it too high and assuming DLQ is broken

✅ Typical values: 3–5

Action

Set maxReceiveCount deliberately
Align it with how many retries you actually want

Step 3: Ensure Failed Messages Are NOT Deleted

In SQS, deletion is final.

If your consumer deletes a message — even after failure — SQS considers it successfully processed and will never redrive it.

Common traps:

finally { deleteMessage() }
Catching exceptions and returning success
Lambda handlers swallowing errors

❌ Delete on failure
✅ Let the message become visible again

Action

Audit delete logic carefully
Delete only after true success

Step 4: Understand What Increments Receive Count

A receive happens when:

A consumer receives the message
The visibility timeout begins

It increments even if:

Processing never starts
Lambda times out
The consumer crashes

This means:

Visibility timeout + retries drive DLQ behavior
Fast failures can hit maxReceiveCount quickly

Action

Ensure visibility timeout matches processing time
Expect receive count to increase even on crashes

Step 5: Validate Lambda-Specific Behavior

When using Lambda with SQS:

❌ Returning success tells SQS the entire batch succeeded
❌ One failure can requeue every message in the batch

✅ Enable partial batch failure handling (ReportBatchItemFailures)
✅ Let only failed messages retry

If Lambda never signals failure, the DLQ will never trigger.

Pro Tips

DLQs only work if messages are allowed to fail
Deleting a message is irreversible
maxReceiveCount counts receives, not errors
Test DLQs intentionally with bad payloads
Queue types must match: FIFO → FIFO, Standard → Standard

Conclusion

When messages never reach a Dead-Letter Queue, SQS isn’t broken.

It’s doing exactly what it was told.

DLQs are precise tools. They only activate when:

A valid redrive policy exists
Permissions and encryption allow redrive
Receives exceed the threshold
Messages are not deleted prematurely

Once those conditions are aligned, DLQs work predictably — and failures become visible again.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Search This Blog

Tech-Reader.blog