S3 Event Duplication: When Lambda Fires Twice for the Same Object (S3 → Lambda → DynamoDB)
When your event-driven pipeline starts “helping too much,” one upload can quietly double your workload — and your bill.
Problem
You’ve built a lean, event-driven pipeline:
S3 uploads → Lambda processing → DynamoDB storage.
It’s fast, automatic, and usually rock-solid.
Then one day, you check DynamoDB and see two identical rows for the same file.
You didn’t upload twice.
Lambda didn’t error out.
S3 just… fired twice.
(This happens when S3 doesn’t receive the acknowledgment it expects from Lambda in time, so it retries the event.)
This post uncovers why — and how to build in a fix that keeps your data clean.
Clarifying the Issue
This isn’t a Lambda misfire.
It’s S3’s at-least-once delivery promise doing its job.
When you upload a file, S3 triggers a notification event.
If Lambda doesn’t acknowledge that event fast enough — due to a timeout, network hiccup, or transient AWS issue — S3 retries.
Each retry looks identical to your code, but it’s a new invocation.
The result: duplicate writes in DynamoDB, doubled analytics, inflated metrics.
In short:
- S3 guarantees delivery, not uniqueness.
- Lambda treats every retry as a new event.
- Without idempotency, your data layer drifts.
Why It Matters
At scale, this kind of drift is poison.
Your analytics show inflated counts, billing summaries go off, and dashboards lie.
More subtly, these duplicates can break downstream processes that expect one-and-only-one record per object.
The cloud isn’t wrong — it’s just doing what you told it to.
Your job is to make sure retries don’t hurt you.
Key Terms
- At-least-once delivery — S3 guarantees each event will be delivered, possibly multiple times.
- Idempotency — Designing so repeated operations don’t change the final state.
- Event retry — Automatic resend when a function doesn’t return an acknowledgment.
- Deduplication logic — Guard code that skips already-processed events.
- Pipeline integrity — Ensuring one upload equals one record, every time.
Steps at a Glance
- Create the pipeline infrastructure — S3 bucket, Lambda function, and DynamoDB table.
- Upload a test file and confirm single event processing.
- Simulate a retry to reproduce duplication.
- Apply an idempotency fix to prevent duplicate inserts.
- Re-test the pipeline and confirm one consistent record.
Step 1 – Create the Pipeline Infrastructure
Create the bucket and table
aws s3api create-bucket --bucket s3-event-demo --region us-east-1
aws dynamodb create-table \
  --table-name s3-event-table \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST
✅ Verify table status
aws dynamodb describe-table --table-name s3-event-table --query "Table.TableStatus"
✅ Output:
"ACTIVE"
❌ If you see "CREATING", wait a few seconds and re-run.
Step 2 – Upload a Test File and Confirm Single Event Processing
Create a simple Lambda handler that writes filenames to DynamoDB.
cat > lambda_handler.py <<'EOF'
import boto3, json, hashlib
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('s3-event-table')
def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()
        table.put_item(Item={'id': event_id, 'filename': key})
EOF
Zip and deploy the function
zip function.zip lambda_handler.py
aws lambda create-function \
  --function-name s3EventHandler \
  --zip-file fileb://function.zip \
  --handler lambda_handler.lambda_handler \
  --runtime python3.11 \
  --role arn:aws:iam::123456789012:role/LambdaExecutionRole
✅ Confirm deployment
aws lambda get-function --function-name s3EventHandler
✅ Output (abbreviated):
{"Configuration":{"FunctionName":"s3EventHandler","Runtime":"python3.11"}}
Connect S3 to Lambda
aws s3api put-bucket-notification-configuration \
  --bucket s3-event-demo \
  --notification-configuration '{
    "LambdaFunctionConfigurations":[
      {
        "LambdaFunctionArn":"arn:aws:lambda:us-east-1:123456789012:function:s3EventHandler",
        "Events":["s3:ObjectCreated:*"]
      }
    ]
  }'
✅ Verify setup
aws s3api get-bucket-notification-configuration --bucket s3-event-demo
✅ Output (abbreviated):
{"LambdaFunctionConfigurations":[{"Events":["s3:ObjectCreated:*"]}]}
Upload a test file
echo "Invoice A" > invoice.txt
aws s3 cp invoice.txt s3://s3-event-demo/
✅ Verify DynamoDB received one record
aws dynamodb scan --table-name s3-event-table
✅ Output:
{"Count":1,"Items":[{"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"}}]}
Step 3 – Simulate a Retry to Reproduce Duplication
Modify the Lambda to simulate a timeout
Add a line that raises an error when processing .txt files.
cat > lambda_handler.py <<'EOF'
import boto3, json, hashlib
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('s3-event-table')
def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        # 👇 Added block to simulate a retry condition for this demo
        if key.endswith(".txt"):
            raise Exception("Simulated timeout")
        event_id = hashlib.md5(key.encode()).hexdigest()
        table.put_item(Item={'id': event_id, 'filename': key})
EOF
Redeploy and re-upload the same file
zip function.zip lambda_handler.py
aws lambda update-function-code \
  --function-name s3EventHandler \
  --zip-file fileb://function.zip
aws s3 cp invoice.txt s3://s3-event-demo/
✅ Check DynamoDB again
aws dynamodb scan --table-name s3-event-table
✅ Output (duplication confirmed)
{"Count":2,"Items":[
 {"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"}},
 {"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"}}
]}
Step 4 – Apply an Idempotency Fix to Prevent Duplicate Inserts
Add idempotency logic to skip duplicate events
Brief change: before inserting, check if the item already exists.
cat > lambda_handler.py <<'EOF'
import boto3, json, hashlib
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('s3-event-table')
def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()
        # Prevent duplicates by checking DynamoDB first
        existing = table.get_item(Key={'id': event_id})
        if 'Item' in existing:
            print(f"Duplicate event ignored for {key}")
            continue
        table.put_item(Item={'id': event_id, 'filename': key})
EOF
For high-volume pipelines:
You can replace the above with a single atomic call:
table.put_item(
    Item={'id': event_id, 'filename': key},
    ConditionExpression="attribute_not_exists(id)"
)
This ensures duplicates are rejected automatically at the database layer — ideal for heavy concurrency.
Redeploy and re-upload
zip function.zip lambda_handler.py
aws lambda update-function-code \
  --function-name s3EventHandler \
  --zip-file fileb://function.zip
aws s3 cp invoice.txt s3://s3-event-demo/
Step 5 – Re-Test the Pipeline and Confirm One Consistent Record
✅ Verify only one record remains
aws dynamodb scan --table-name s3-event-table
✅ Output:
{"Count":1,"Items":[{"filename":{"S":"invoice.txt"}}]}
Duplicates are now ignored — clean, consistent, and repeatable.
Pro Tips
- Hash the bucket name + key (+ event time) to create robust idempotency keys, especially if the same file name can be uploaded repeatedly and legitimately needs re-processing.
- Enable Lambda destinations for success/failure tracking.
- Monitor DynamoDB item counts and S3 event metrics with CloudWatch.
- Use a TTL attribute to expire processed keys and prevent unbounded growth.
Conclusion
S3 never guaranteed exactly-once delivery — it only promised to keep trying until it got through.
That reliability is a double-edged sword.
When you add a simple idempotency check, retries become harmless.
Your pipeline stays consistent, DynamoDB stays clean, and costs stay predictable.
In AWS, resilience isn’t about avoiding retries — it’s about handling them gracefully.
The exactly-once guarantee isn’t something AWS gives you — it’s something you build yourself.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.


 
 
 
Comments
Post a Comment