AWS SQS Error: Visibility Timeout vs. Lambda Timeout Misalignment (Duplicate Processing & Phantom Retries)
Once the timeouts are aligned—and partial batch responses and idempotency are in place—the system becomes predictable, stable, and boring again.
Problem
Your Amazon SQS–triggered Lambda function appears to “work,” but you start seeing:
- Messages processed more than once
- Duplicate records written to databases
- Side effects repeated (emails sent twice, files processed twice)
- Dead-letter queues filling unexpectedly
No errors are logged. Nothing looks obviously broken. Yet the system behaves unpredictably.
Clarifying the Issue
This is almost never a bug in Lambda or Amazon SQS.
It is a timeout mismatch.
Specifically:
❌ Your Lambda timeout is longer than the SQS visibility timeout.
When this happens, SQS assumes the message was not processed successfully and makes it visible again—while your Lambda is still running.
The result: the same message is processed multiple times, often concurrently.
Why It Matters
This misconfiguration sits at the intersection of reasonable defaults that do not work well together:
- Lambda default timeout: 3 seconds
- SQS default visibility timeout: 30 seconds
Teams often increase the Lambda timeout to handle heavier workloads—but forget to update the queue.
The consequences are serious:
- Duplicate writes
- Data corruption
- Inconsistent state
- Hard-to-debug “ghost retries”
- Escalating costs from repeated invocations
Batch amplification makes this worse:
Because SQS delivers messages to Lambda in batches (up to 10 by default), one slow message timing out can cause the entire batch to be retried, even if the other messages were processed successfully.
Key Terms
- Visibility Timeout – The period during which an SQS message is hidden after being received
- Lambda Timeout – The maximum time a Lambda function is allowed to run
- Event Source Mapping – The configuration that connects SQS to Lambda
- At-Least-Once Delivery – SQS’s delivery model; duplicates are possible by design
- Idempotency – The ability to safely process the same message more than once
Steps at a Glance
- Compare the Lambda timeout and SQS visibility timeout
- Understand what happens when Lambda runs longer
- Fix the timeout misalignment
- Check the event source mapping configuration
- Add idempotency as a safety net
Detailed Steps
Step 1: Compare the Two Timeouts
Start by checking the actual values—not what you think they are.
- Lambda timeout: Function configuration
- SQS visibility timeout: Queue settings
If:
Lambda timeout > SQS visibility timeout
You have a problem.
Step 2: Understand the Failure Mode
Here is what actually happens:
- SQS delivers a batch of messages to Lambda
- SQS hides those messages for the visibility timeout
- Lambda continues processing
- The visibility timeout expires
- SQS makes the entire batch visible again
- Lambda receives the same messages again
- Multiple invocations now process the same data independently
From AWS’s perspective, everything is working as designed.
From your perspective, the system is duplicating work and corrupting state.
Step 3: Fix the Timeout Misalignment
The rule is simple:
✅ The SQS visibility timeout must be greater than the Lambda timeout.
A minimum safe baseline:
Visibility Timeout ≥ (Lambda Timeout × 2) + buffer
This buffer accounts for:
- Cold starts
- Retry jitter
- Temporary slowdowns
Note: AWS documentation recommends up to 6× the Lambda timeout for high-throughput or heavily throttled systems. For most workloads, 2× is the minimum required to prevent race conditions.
Action
- Increase the SQS visibility timeout
- Or reduce the Lambda timeout
- Do not leave them equal
Step 4: Check the Event Source Mapping Configuration
When SQS triggers Lambda, AWS manages polling, batching, and retries automatically.
Key behaviors to remember:
- Lambda does not delete messages
- SQS deletes messages only after successful completion
- A timeout is treated as a failure
Action
- Enable partial batch responses by turning on
ReportBatchItemFailures - This prevents successful messages in a batch from being retried when only one message fails
- Use this as damage control—not as a substitute for correct timeout configuration
Step 5: Add Idempotency (Defense in Depth)
Even with perfect timeouts, SQS is at-least-once delivery.
Duplicates will happen eventually.
Best practices
- Use idempotency keys
- Deduplicate writes
- Track processed message IDs
- Make downstream operations safe to repeat
Timeout alignment prevents most duplicates.
Idempotency prevents all damage.
Pro Tips
- Never rely on defaults for SQS + Lambda
- Log message IDs during processing
- Treat timeouts as configuration, not tuning
- Test failure paths explicitly
- If duplicates appear “random,” suspect timeouts first
Conclusion
Visibility timeout vs Lambda timeout misalignment is a configuration trap, not a coding error.
When Lambda runs longer than SQS expects, SQS does exactly what it is designed to do: retry.
Once the timeouts are aligned—and partial batch responses and idempotency are in place—the system becomes predictable, stable, and boring again.
That is the goal.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
.jpeg)

Comments
Post a Comment