AWS SQS Error: Visibility Timeout vs. Lambda Timeout Misalignment (Duplicate Processing & Phantom Retries)

 

AWS SQS Error: Visibility Timeout vs. Lambda Timeout Misalignment (Duplicate Processing & Phantom Retries)

Once the timeouts are aligned—and partial batch responses and idempotency are in place—the system becomes predictable, stable, and boring again.





Problem

Your Amazon SQS–triggered Lambda function appears to “work,” but you start seeing:

  • Messages processed more than once
  • Duplicate records written to databases
  • Side effects repeated (emails sent twice, files processed twice)
  • Dead-letter queues filling unexpectedly

No errors are logged. Nothing looks obviously broken. Yet the system behaves unpredictably.


Clarifying the Issue

This is almost never a bug in Lambda or Amazon SQS.

It is a timeout mismatch.

Specifically:

❌ Your Lambda timeout is longer than the SQS visibility timeout.

When this happens, SQS assumes the message was not processed successfully and makes it visible again—while your Lambda is still running.

The result: the same message is processed multiple times, often concurrently.


Why It Matters

This misconfiguration sits at the intersection of reasonable defaults that do not work well together:

  • Lambda default timeout: 3 seconds
  • SQS default visibility timeout: 30 seconds

Teams often increase the Lambda timeout to handle heavier workloads—but forget to update the queue.

The consequences are serious:

  • Duplicate writes
  • Data corruption
  • Inconsistent state
  • Hard-to-debug “ghost retries”
  • Escalating costs from repeated invocations

Batch amplification makes this worse:
Because SQS delivers messages to Lambda in batches (up to 10 by default), one slow message timing out can cause the entire batch to be retried, even if the other messages were processed successfully.


Key Terms

  • Visibility Timeout – The period during which an SQS message is hidden after being received
  • Lambda Timeout – The maximum time a Lambda function is allowed to run
  • Event Source Mapping – The configuration that connects SQS to Lambda
  • At-Least-Once Delivery – SQS’s delivery model; duplicates are possible by design
  • Idempotency – The ability to safely process the same message more than once

Steps at a Glance

  1. Compare the Lambda timeout and SQS visibility timeout
  2. Understand what happens when Lambda runs longer
  3. Fix the timeout misalignment
  4. Check the event source mapping configuration
  5. Add idempotency as a safety net

Detailed Steps

Step 1: Compare the Two Timeouts

Start by checking the actual values—not what you think they are.

  • Lambda timeout: Function configuration
  • SQS visibility timeout: Queue settings

If:

Lambda timeout > SQS visibility timeout

You have a problem.


Step 2: Understand the Failure Mode

Here is what actually happens:

  1. SQS delivers a batch of messages to Lambda
  2. SQS hides those messages for the visibility timeout
  3. Lambda continues processing
  4. The visibility timeout expires
  5. SQS makes the entire batch visible again
  6. Lambda receives the same messages again
  7. Multiple invocations now process the same data independently

From AWS’s perspective, everything is working as designed.

From your perspective, the system is duplicating work and corrupting state.


Step 3: Fix the Timeout Misalignment

The rule is simple:

✅ The SQS visibility timeout must be greater than the Lambda timeout.

minimum safe baseline:

Visibility Timeout ≥ (Lambda Timeout × 2) + buffer

This buffer accounts for:

  • Cold starts
  • Retry jitter
  • Temporary slowdowns

Note: AWS documentation recommends up to  the Lambda timeout for high-throughput or heavily throttled systems. For most workloads, 2× is the minimum required to prevent race conditions.

Action

  • Increase the SQS visibility timeout
  • Or reduce the Lambda timeout
  • Do not leave them equal

Step 4: Check the Event Source Mapping Configuration

When SQS triggers Lambda, AWS manages polling, batching, and retries automatically.

Key behaviors to remember:

  • Lambda does not delete messages
  • SQS deletes messages only after successful completion
  • A timeout is treated as a failure

Action

  • Enable partial batch responses by turning on ReportBatchItemFailures
  • This prevents successful messages in a batch from being retried when only one message fails
  • Use this as damage control—not as a substitute for correct timeout configuration

Step 5: Add Idempotency (Defense in Depth)

Even with perfect timeouts, SQS is at-least-once delivery.

Duplicates will happen eventually.

Best practices

  • Use idempotency keys
  • Deduplicate writes
  • Track processed message IDs
  • Make downstream operations safe to repeat

Timeout alignment prevents most duplicates.
Idempotency prevents all damage.


Pro Tips

  • Never rely on defaults for SQS + Lambda
  • Log message IDs during processing
  • Treat timeouts as configuration, not tuning
  • Test failure paths explicitly
  • If duplicates appear “random,” suspect timeouts first

Conclusion

Visibility timeout vs Lambda timeout misalignment is a configuration trap, not a coding error.

When Lambda runs longer than SQS expects, SQS does exactly what it is designed to do: retry.

Once the timeouts are aligned—and partial batch responses and idempotency are in place—the system becomes predictable, stable, and boring again.

That is the goal.


Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison