S3 Event Duplication: When Lambda Fires Twice for the Same Object (S3 → Lambda → DynamoDB)

 

S3 Event Duplication: When Lambda Fires Twice for the Same Object (S3 → Lambda → DynamoDB)

When your event-driven pipeline starts “helping too much,” one upload can quietly double your workload — and your bill.





Problem

You’ve built a lean, event-driven pipeline:

S3 uploads → Lambda processing → DynamoDB storage.

It’s fast, automatic, and usually rock-solid.

Then one day, you check DynamoDB and see two identical rows for the same file.

You didn’t upload twice.
Lambda didn’t error out.
S3 just… fired twice.
(This happens when S3 doesn’t receive the acknowledgment it expects from Lambda in time, so it retries the event.)

This post uncovers why — and how to build in a fix that keeps your data clean.


Clarifying the Issue

This isn’t a Lambda misfire.

It’s S3’s at-least-once delivery promise doing its job.

When you upload a file, S3 triggers a notification event.

If Lambda doesn’t acknowledge that event fast enough — due to a timeout, network hiccup, or transient AWS issue — S3 retries.

Each retry looks identical to your code, but it’s a new invocation.

The result: duplicate writes in DynamoDB, doubled analytics, inflated metrics.

In short:

  • S3 guarantees delivery, not uniqueness.
  • Lambda treats every retry as a new event.
  • Without idempotency, your data layer drifts.

Why It Matters

At scale, this kind of drift is poison.

Your analytics show inflated counts, billing summaries go off, and dashboards lie.

More subtly, these duplicates can break downstream processes that expect one-and-only-one record per object.

The cloud isn’t wrong — it’s just doing what you told it to.

Your job is to make sure retries don’t hurt you.


Key Terms

  • At-least-once delivery — S3 guarantees each event will be delivered, possibly multiple times.
  • Idempotency — Designing so repeated operations don’t change the final state.
  • Event retry — Automatic resend when a function doesn’t return an acknowledgment.
  • Deduplication logic — Guard code that skips already-processed events.
  • Pipeline integrity — Ensuring one upload equals one record, every time.

Steps at a Glance

  1. Create the pipeline infrastructure — S3 bucket, Lambda function, and DynamoDB table.
  2. Upload a test file and confirm single event processing.
  3. Simulate a retry to reproduce duplication.
  4. Apply an idempotency fix to prevent duplicate inserts.
  5. Re-test the pipeline and confirm one consistent record.

Step 1 – Create the Pipeline Infrastructure

Create the bucket and table

aws s3api create-bucket --bucket s3-event-demo --region us-east-1
aws dynamodb create-table \
  --table-name s3-event-table \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

✅ Verify table status

aws dynamodb describe-table --table-name s3-event-table --query "Table.TableStatus"

✅ Output:

"ACTIVE"

❌ If you see "CREATING", wait a few seconds and re-run.


Step 2 – Upload a Test File and Confirm Single Event Processing

Create a simple Lambda handler that writes filenames to DynamoDB.

cat > lambda_handler.py <<'EOF'
import boto3, json, hashlib

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('s3-event-table')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()
        table.put_item(Item={'id': event_id, 'filename': key})
EOF

Zip and deploy the function

zip function.zip lambda_handler.py
aws lambda create-function \
  --function-name s3EventHandler \
  --zip-file fileb://function.zip \
  --handler lambda_handler.lambda_handler \
  --runtime python3.11 \
  --role arn:aws:iam::123456789012:role/LambdaExecutionRole

✅ Confirm deployment

aws lambda get-function --function-name s3EventHandler

✅ Output (abbreviated):

{"Configuration":{"FunctionName":"s3EventHandler","Runtime":"python3.11"}}

Connect S3 to Lambda

aws s3api put-bucket-notification-configuration \
  --bucket s3-event-demo \
  --notification-configuration '{
    "LambdaFunctionConfigurations":[
      {
        "LambdaFunctionArn":"arn:aws:lambda:us-east-1:123456789012:function:s3EventHandler",
        "Events":["s3:ObjectCreated:*"]
      }
    ]
  }'

✅ Verify setup

aws s3api get-bucket-notification-configuration --bucket s3-event-demo

✅ Output (abbreviated):

{"LambdaFunctionConfigurations":[{"Events":["s3:ObjectCreated:*"]}]}

Upload a test file

echo "Invoice A" > invoice.txt
aws s3 cp invoice.txt s3://s3-event-demo/

✅ Verify DynamoDB received one record

aws dynamodb scan --table-name s3-event-table

✅ Output:

{"Count":1,"Items":[{"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"}}]}

Step 3 – Simulate a Retry to Reproduce Duplication

Modify the Lambda to simulate a timeout
Add a line that raises an error when processing .txt files.

cat > lambda_handler.py <<'EOF'
import boto3, json, hashlib

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('s3-event-table')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']

        # 👇 Added block to simulate a retry condition for this demo
        if key.endswith(".txt"):
            raise Exception("Simulated timeout")

        event_id = hashlib.md5(key.encode()).hexdigest()
        table.put_item(Item={'id': event_id, 'filename': key})
EOF

Redeploy and re-upload the same file

zip function.zip lambda_handler.py
aws lambda update-function-code \
  --function-name s3EventHandler \
  --zip-file fileb://function.zip
aws s3 cp invoice.txt s3://s3-event-demo/

✅ Check DynamoDB again

aws dynamodb scan --table-name s3-event-table

✅ Output (duplication confirmed)

{"Count":2,"Items":[
 {"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"}},
 {"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"}}
]}

Step 4 – Apply an Idempotency Fix to Prevent Duplicate Inserts

Add idempotency logic to skip duplicate events
Brief change: before inserting, check if the item already exists.

cat > lambda_handler.py <<'EOF'
import boto3, json, hashlib

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('s3-event-table')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()

        # Prevent duplicates by checking DynamoDB first
        existing = table.get_item(Key={'id': event_id})
        if 'Item' in existing:
            print(f"Duplicate event ignored for {key}")
            continue

        table.put_item(Item={'id': event_id, 'filename': key})
EOF

For high-volume pipelines:
You can replace the above with a single atomic call:

table.put_item(
    Item={'id': event_id, 'filename': key},
    ConditionExpression="attribute_not_exists(id)"
)

This ensures duplicates are rejected automatically at the database layer — ideal for heavy concurrency.

Redeploy and re-upload

zip function.zip lambda_handler.py
aws lambda update-function-code \
  --function-name s3EventHandler \
  --zip-file fileb://function.zip
aws s3 cp invoice.txt s3://s3-event-demo/

Step 5 – Re-Test the Pipeline and Confirm One Consistent Record

✅ Verify only one record remains

aws dynamodb scan --table-name s3-event-table

✅ Output:

{"Count":1,"Items":[{"filename":{"S":"invoice.txt"}}]}

Duplicates are now ignored — clean, consistent, and repeatable.


Pro Tips

  • Hash the bucket name + key (+ event time) to create robust idempotency keys, especially if the same file name can be uploaded repeatedly and legitimately needs re-processing.
  • Enable Lambda destinations for success/failure tracking.
  • Monitor DynamoDB item counts and S3 event metrics with CloudWatch.
  • Use a TTL attribute to expire processed keys and prevent unbounded growth.

Conclusion

S3 never guaranteed exactly-once delivery — it only promised to keep trying until it got through.

That reliability is a double-edged sword.

When you add a simple idempotency check, retries become harmless.

Your pipeline stays consistent, DynamoDB stays clean, and costs stay predictable.

In AWS, resilience isn’t about avoiding retries — it’s about handling them gracefully.

The exactly-once guarantee isn’t something AWS gives you — it’s something you build yourself.


Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite