When Lambda Writes Twice: Idempotency Keys in DynamoDB (S3 → Lambda → DynamoDB)

#aws #s3 #lambda #dynamodb

By adding a one-line guard to your DynamoDB writes, you transform a best-effort pipeline into a predictable, idempotent architecture.

When a system that promises “at least once” keeps its word, your data might double.

You didn’t break anything — your architecture did exactly what it was designed to do.

That’s the problem.

Problem

You’ve built a clean, event-driven pipeline:

S3 → Lambda → DynamoDB.

Each time a new file lands in S3, your Lambda function runs and logs the event to DynamoDB.

It works beautifully — until one day you see this:

{
  "Count": 2,
  "Items": [
    {"id": {"S": "invoice.txt"}, "status": {"S": "processed"}},
    {"id": {"S": "invoice.txt"}, "status": {"S": "processed"}}
  ]
}

Same event. Same object. Two entries.

Nothing’s wrong with your code — the pipeline just fired twice.

Clarifying the Issue

AWS services like S3, SNS, and EventBridge all provide at-least-once delivery guarantees.

This model is intentional — it favors durability and reliability over uniqueness, ensuring that no event is ever silently lost even if it means delivering it twice.

When a Lambda invocation fails to confirm receipt (for example, if the function runs long or AWS’s internal acknowledgement doesn’t arrive fast enough), S3 queues up a retry.

You now have two invocations of the same event.

Both Lambdas see a valid S3 record.
Both write to DynamoDB.
And unless you enforce idempotency, both succeed.

That’s how “exactly once” turns into “twice for sure.”

Why It Matters

Duplicates quietly poison your data lake:

Analytics inflate metrics. Your billing report says 200 invoices, not 100.
Event chains misfire. Downstream systems see phantom updates.
Integrity erodes. Once duplication creeps in, you can’t easily trust your tables again.

This isn’t a bug. It’s a guarantee working as designed.

That means the fix isn’t patching — it’s architecture.

Key Terms

At-Least-Once Delivery — AWS guarantees delivery but may retry events.
Idempotency — The principle that repeating the same operation has no additional effect.
ConditionExpression — A DynamoDB write guard that only succeeds if a condition is true.
Duplicate Event — A retried message identical to a previously processed one.
Idempotency Key — A unique fingerprint derived from event context to ensure uniqueness.

Steps at a Glance

Create the baseline pipeline — S3 bucket, Lambda, and DynamoDB table.
Upload a file to trigger the event.
Simulate a duplicate event manually.
Observe the duplicate rows in DynamoDB.
Apply the idempotency fix.
Re-test to confirm exactly-once behavior.

Step 1 – Create the Baseline Pipeline

Provision the resources

aws s3api create-bucket --bucket lambda-idempotency-demo --region us-east-1
aws dynamodb create-table \
  --table-name lambda-idempotency-table \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

✅ Verify table readiness:

aws dynamodb describe-table --table-name lambda-idempotency-table --query "Table.TableStatus"

✅ Output:

"ACTIVE"

Create the initial Lambda

cat > lambda_handler.py <<'EOF'
import boto3, hashlib

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('lambda-idempotency-table')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()
        table.put_item(Item={'id': event_id, 'filename': key, 'status': 'processed'})
EOF

✅ Deploy it:

zip function.zip lambda_handler.py
aws lambda create-function \
  --function-name lambdaIdempotencyDemo \
  --zip-file fileb://function.zip \
  --handler lambda_handler.lambda_handler \
  --runtime python3.11 \
  --role arn:aws:iam::123456789012:role/LambdaExecutionRole

✅ Connect S3 to Lambda:

aws s3api put-bucket-notification-configuration \
  --bucket lambda-idempotency-demo \
  --notification-configuration '{
    "LambdaFunctionConfigurations":[
      {
        "LambdaFunctionArn":"arn:aws:lambda:us-east-1:123456789012:function:lambdaIdempotencyDemo",
        "Events":["s3:ObjectCreated:*"]
      }
    ]
  }'

Step 2 – Upload a File

echo "Invoice A" > invoice.txt
aws s3 cp invoice.txt s3://lambda-idempotency-demo/

✅ Check DynamoDB:

aws dynamodb scan --table-name lambda-idempotency-table

✅ Output:

{"Count":1,"Items":[{"id":{"S":"<hash>"},"filename":{"S":"invoice.txt"},"status":{"S":"processed"}}]}

Pipeline works perfectly.

Step 3 – Simulate a Duplicate Event

Manually invoke the Lambda again with the same S3 event payload:

aws lambda invoke \
  --function-name lambdaIdempotencyDemo \
  --payload '{
    "Records":[
      {"s3":{"bucket":{"name":"lambda-idempotency-demo"},"object":{"key":"invoice.txt"}}}
    ]
  }' \
  output.json

✅ Check DynamoDB again:

aws dynamodb scan --table-name lambda-idempotency-table

❌ Output:

{"Count":2}

Same record twice — confirmed duplication.

Step 4 – Apply the Idempotency Fix

We’ll now make the DynamoDB write conditional, so that it only succeeds if the record doesn’t already exist.

cat > lambda_handler.py <<'EOF'
import boto3, hashlib
from botocore.exceptions import ClientError

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('lambda-idempotency-table')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()
        # ✅ Idempotency fix: only write if this record does not already exist
        try:
            table.put_item(
                Item={'id': event_id, 'filename': key, 'status': 'processed'},
                ConditionExpression="attribute_not_exists(id)"
            )
            print(f"Inserted {key}")
        except ClientError as e:
            if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
                print(f"Duplicate skipped for {key}")
            else:
                raise
EOF

✅ Redeploy:

zip function.zip lambda_handler.py
aws lambda update-function-code \
  --function-name lambdaIdempotencyDemo \
  --zip-file fileb://function.zip

Step 5 – Re-Test

Re-invoke the same payload:

aws lambda invoke \
  --function-name lambdaIdempotencyDemo \
  --payload '{
    "Records":[
      {"s3":{"bucket":{"name":"lambda-idempotency-demo"},"object":{"key":"invoice.txt"}}}
    ]
  }' \
  output.json

✅ CloudWatch log output:

Duplicate skipped for invoice.txt

✅ DynamoDB scan:

{"Count":1}

Exactly once.

Guaranteed.

Step 6 – Verify Real-World Behavior

Trigger a few uploads and watch the pattern:

First upload → “Inserted invoice.txt”
Same upload retried → “Duplicate skipped for invoice.txt”
New file → new ID, new record.

Your pipeline now behaves deterministically, even under retries or latency storms.

Pro Tips

Use ConditionExpression="attribute_not_exists(id)" for quick, clean idempotency.
For multi-field checks, combine with AND conditions (e.g., attribute_not_exists(id) AND attribute_not_exists(filename)).
Hash bucket + key + eventTime for a globally unique idempotency key when reprocessing is legitimate.
Always log “Duplicate skipped” messages — they’re your quiet proof of consistency and invaluable when auditing system reliability or debugging retry storms.

Conclusion

AWS never promised exactly once. It promised at least once.

That’s not a bug — it’s your invitation to design resilient systems.

By adding a one-line guard to your DynamoDB writes, you transform a best-effort pipeline into a predictable, idempotent architecture.

In cloud systems, reliability isn’t about stopping retries — it’s about making them harmless.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Search This Blog

Tech-Reader.blog