AWS SQS Error: SQS Message Retention Period
Why messages “disappear”
Problem
Messages are sent to an Amazon SQS queue successfully.
Later, you go looking for them—and they’re gone.
- They were not processed
- They were not deleted by consumers
- No DLQ entries exist
- No errors were logged
The messages simply vanished.
Clarifying the Issue
This is almost never a bug in SQS.
It is almost always the Message Retention Period doing exactly what it is designed to do.
Every SQS queue has a retention window:
- Messages older than this window are permanently deleted
- This happens automatically
- No consumer action is required
- No warning is issued
Once the retention period expires, the message is gone forever.
Why It Matters
Many teams implicitly assume:
“If a message is in SQS, it will stay there until something processes it.”
That assumption is wrong.
SQS is not durable storage.
It is a time-bound buffer.
When retention is misunderstood, teams experience:
- Lost events
- Missing data
- Incomplete replays
- “Ghost” bugs that can’t be reproduced
This is especially dangerous in:
- Low-traffic queues
- Paused consumers
- Backlogged systems
- Failure-recovery scenarios
Key Terms
- Message Retention Period – How long SQS stores a message before deleting it
- Visibility Timeout – How long a message is hidden after being received
- Receive Count – Number of times a message has been delivered
- Dead-Letter Queue (DLQ) – Destination for repeatedly failed messages
- Silent Expiration – Message deletion with no error or notification
Steps at a Glance
- Check the queue’s Message Retention Period
- Understand what retention does and does not do
- Stop confusing retention with visibility timeout
- Validate consumer availability assumptions
- Check DLQ retention vs source retention
Detailed Steps
Step 1: Check the Message Retention Period (and the DLQ)
Every SQS queue has a retention period between:
- Minimum: 60 seconds
- Maximum: 14 days
The default is 4 days.
When a message exceeds this age:
- It is deleted automatically
- It is not moved to a DLQ
- It is not recoverable
Action
- Open the queue settings
- Verify the retention period explicitly
- Check the DLQ: Ensure the DLQ retention is longer than the source queue retention
- ✅ Best practice: Set DLQ retention to 14 days so you have time to debug failures
Step 2: Understand What Retention Actually Controls
Retention is based on message age, not activity.
Retention does not reset when:
- A message is received
- A visibility timeout expires
- A retry occurs
- A consumer crashes
The clock starts when the message is first enqueued.
❌ “Retries keep the message alive”
✅ “Age alone determines expiration”
Step 3: Do Not Confuse Retention with Visibility Timeout
This is the most common misunderstanding.
- Visibility Timeout controls when a message can be seen again
- Retention Period controls how long the message exists at all
A message can:
- Retry many times
- Become visible repeatedly
- Still be deleted when retention expires
❌ Increasing visibility timeout to prevent deletion
✅ Increasing retention period if messages must live longer
They solve different problems.
Step 4: Validate Consumer Availability Assumptions
Retention issues often surface when:
- Consumers are down for hours or days
- Traffic is sporadic
- Backlogs accumulate quietly
If consumers do not process messages before retention expires, SQS will clean up for you.
Action
- Ensure consumers are highly available
- Monitor queue age metrics
- Alert on oldest message age, not just queue depth
Step 5: Understand the DLQ Retention Trap
This is a subtle but serious failure mode.
The Scenario
- A message enters the source queue
- It retries slowly for several days
- It finally exceeds
maxReceiveCountand moves to the DLQ - ❌ The DLQ has the same retention period as the source queue
- The message expires almost immediately after landing
You now have nothing to inspect.
Action
- Treat DLQs as forensic tools
- ✅ Set DLQ retention to 14 days (maximum)
- Never match DLQ retention to the source queue
Pro Tips
- Messages can expire even if never processed
- Retention expiration is silent by design
- ❌ DLQs do not protect against retention loss
- ✅ Golden Rule: Set DLQ retention to 14 days
- Monitor
ApproximateAgeOfOldestMessage
Conclusion
When messages “disappear” from SQS, nothing is broken.
The queue is enforcing its Message Retention Period.
Once you clearly separate:
- Retention (how long messages live)
- Visibility (when messages can be seen)
- DLQs (what happens after repeated failures)
…the behavior becomes predictable and controllable.
This Fix-It isn’t about preventing expiration.
It’s about designing systems that respect time-bound queues.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.


Comments
Post a Comment