When Atomicity Meets Idempotency: DynamoDB Transactions + Powertools (S3 → Lambda → DynamoDB)

 

When Atomicity Meets Idempotency: DynamoDB Transactions + Powertools (S3 → Lambda → DynamoDB)

In distributed systems, reliability isn’t luck — it’s architecture.





Sometimes, “helpful retries” can turn your database into a mess.

You didn’t do anything wrong — your system just delivered twice as promised.

Now let’s make it bulletproof.


Problem

Your event-driven pipeline looks clean on paper:

S3 → Lambda → DynamoDB

Each time a file lands in S3, Lambda logs it in a DynamoDB table and writes a short audit entry to another table.

Everything works beautifully — until one day you notice this:

{
  "main_records": 2,
  "audit_log": 1
}

Same file. Same event. Two main entries. One audit entry.

Lambda retried in the middle of your transaction.

Now your data is out of sync.


Clarifying the Issue

AWS services like S3 and Lambda guarantee at-least-once delivery.

That means you’ll never lose an event — but you might get it twice.

And since a Lambda function’s two put_item calls aren’t atomic, a retry can partially succeed.

One DynamoDB write lands, the other is skipped.

This is how a single retry can create split-brain writes across tables.

To fix it, we need two ingredients:

  • Atomicity → both writes succeed or fail together.
  • Idempotency → duplicate events don’t re-process at all.

Why It Matters

Data consistency is the backbone of trust.

Without atomic and idempotent design:

  • Your analytics drift. Some records show processed twice, others once.
  • Your audit trails fracture. Regulators can’t trust your event lineage.
  • Your teams burn hours debugging ghosts.

We’re going to fix this with one transaction and one decorator.


Key Terms

  • Atomicity — The principle that multiple operations either all succeed or all fail together.
  • Idempotency — Repeating the same operation produces no additional effect.
  • TransactWriteItems — DynamoDB API for performing multiple table writes atomically.
  • AWS Lambda Powertools Idempotency — A utility that prevents duplicate event processing automatically.
  • Persistence Layer — DynamoDB table that stores event IDs for deduplication.

Steps at a Glance

  1. Create the base pipeline (S3, Lambda, DynamoDB).
  2. Add a second table to store audit events.
  3. Observe partial writes under retry conditions.
  4. Apply DynamoDB Transactions for atomic updates.
  5. Add Powertools Idempotency for retry suppression.
  6. Re-test and verify perfectly consistent results.

Step 1 – Create the Base Pipeline

✅ Create the bucket and verify creation:

aws s3api create-bucket --bucket atomicity-demo-bucket --region us-east-1

✅ Check that the bucket exists:

aws s3api list-buckets --query "Buckets[].Name"

✅ Output (if successful):

[
  "atomicity-demo-bucket"
]

✅ (Shows the bucket was successfully created and visible to the AWS CLI.)

❌ If you don’t see your bucket listed, wait a few seconds and re-run the command.

aws s3api list-buckets --query "Buckets[].Name"

✅ Create the DynamoDB tables:

aws dynamodb create-table \
  --table-name lambda-transactions-table \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

aws dynamodb create-table \
  --table-name lambda-audit-log \
  --attribute-definitions AttributeName=event_id,AttributeType=S \
  --key-schema AttributeName=event_id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

✅ Verify both tables:

aws dynamodb list-tables

✅ Output:

{"TableNames": ["lambda-transactions-table", "lambda-audit-log"]}

Step 2 – Simulate the Simple (Non-Atomic) Write

cat > lambda_handler.py <<'EOF'
import boto3, hashlib

dynamodb = boto3.resource('dynamodb')
main_table = dynamodb.Table('lambda-transactions-table')
audit_table = dynamodb.Table('lambda-audit-log')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()

        main_table.put_item(Item={'id': event_id, 'filename': key, 'status': 'processed'})
        audit_table.put_item(Item={'event_id': event_id, 'timestamp': context.aws_request_id})
EOF

✅ Deploy:

zip function.zip lambda_handler.py
aws lambda create-function \
  --function-name atomicityDemo \
  --zip-file fileb://function.zip \
  --handler lambda_handler.lambda_handler \
  --runtime python3.11 \
  --role arn:aws:iam::123456789012:role/LambdaExecutionRole

✅ Connect it to S3 events.

Then upload a file and observe that duplicates sometimes appear in lambda-transactions-table,
but the audit table may only log once — classic partial consistency drift.

   +---------+        +---------+        +----------------------+
   |   S3    | -----> | Lambda  | -----> | DynamoDB: main table |
   +---------+        +---------+        +----------------------+
                              \-------> | DynamoDB: audit log   |
                                         +----------------------+

Step 3 – Apply DynamoDB Transactions

The ConditionExpression effectively enforces uniqueness, making sure each record is inserted only once — a kind of optimistic locking for DynamoDB.

cat > lambda_handler.py <<'EOF'
import boto3, hashlib

dynamodb = boto3.client('dynamodb')

def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()

        # ✅ Transaction: both writes succeed or fail together
        # Ensures each item is inserted only once — acts as a unique constraint for id
        dynamodb.transact_write_items(
            TransactItems=[
                {
                    'Put': {
                        'TableName': 'lambda-transactions-table',
                        'Item': {'id': {'S': event_id}, 'filename': {'S': key}, 'status': {'S': 'processed'}},
                        'ConditionExpression': 'attribute_not_exists(id)'
                    }
                },
                {
                    'Put': {
                        'TableName': 'lambda-audit-log',
                        'Item': {'event_id': {'S': event_id}, 'timestamp': {'S': context.aws_request_id}}
                    }
                }
            ]
        )
EOF

✅ Redeploy:

zip function.zip lambda_handler.py
aws lambda update-function-code \
  --function-name atomicityDemo \
  --zip-file fileb://function.zip

(Optional Note) The lambda-audit-log doesn’t need its own ConditionExpression. Both writes occur within one atomic transaction. The Powertools Idempotency layer will prevent re-processing entirely, ensuring the audit log cannot be duplicated.

Now your writes are atomic — both tables succeed or fail together.

We need one more guardrail.

   +---------+        +---------+        +-------------------------------------+
   |   S3    | -----> | Lambda  | -----> | DynamoDB Transaction (Atomic)       |
   +---------+        +---------+        +-------------------------------------+
                                   |--> lambda-transactions-table (unique write)
                                   |--> lambda-audit-log (event record)
                                [Powertools Idempotency Guard]

Step 4 – Add Powertools Idempotency

Create a small requirements file:

cat > requirements.txt <<'EOF'
aws-lambda-powertools
boto3
EOF

Install locally:

pip install -r requirements.txt -t .

Then replace the Lambda code:

cat > lambda_handler.py <<'EOF'
from aws_lambda_powertools.utilities.idempotency import (
    idempotent_function, DynamoDBPersistenceLayer
)
from aws_lambda_powertools import Logger
import boto3, hashlib

logger = Logger()
dynamodb = boto3.client('dynamodb')
persistence = DynamoDBPersistenceLayer(table_name="lambda-idempotency-store")

@logger.inject_lambda_context
# data_keyword_argument tells Powertools which input to hash as the idempotency key (the S3 event payload)
@idempotent_function(data_keyword_argument="event", persistence_store=persistence)
def lambda_handler(event, context):
    for record in event['Records']:
        key = record['s3']['object']['key']
        event_id = hashlib.md5(key.encode()).hexdigest()
        logger.info(f"Processing {key}")

        # ✅ Atomic + Idempotent: one decorator, one transaction
        dynamodb.transact_write_items(
            TransactItems=[
                {
                    'Put': {
                        'TableName': 'lambda-transactions-table',
                        'Item': {'id': {'S': event_id}, 'filename': {'S': key}, 'status': {'S': 'processed'}},
                        'ConditionExpression': 'attribute_not_exists(id)'
                    }
                },
                {
                    'Put': {
                        'TableName': 'lambda-audit-log',
                        'Item': {'event_id': {'S': event_id}, 'timestamp': {'S': context.aws_request_id}}
                    }
                }
            ]
        )
        logger.info(f"Transaction complete for {key}")
EOF

✅ Deploy again:

zip -r function.zip .
aws lambda update-function-code \
  --function-name atomicityDemo \
  --zip-file fileb://function.zip

Step 5 – Verify Consistency

Trigger multiple uploads and manual retries.
In CloudWatch logs, you’ll now see:

INFO    Processing invoice.txt
INFO    Transaction complete for invoice.txt
INFO    Idempotency key found — skipping duplicate event

✅ Check both tables — they’ll always have matching records.

aws dynamodb scan --table-name lambda-transactions-table
aws dynamodb scan --table-name lambda-audit-log

✅ Expected output (abbreviated for clarity):

{
  "Count": 1,
  "Items": [
    {
      "id": {"S": "invoice.txt"},
      "filename": {"S": "invoice.txt"},
      "status": {"S": "processed"}
    }
  ]
}
{
  "Count": 1,
  "Items": [
    {
      "event_id": {"S": "invoice.txt"},
      "timestamp": {"S": "abcd-1234"}
    }
  ]
}

Pro Tips

  • Keep your idempotency persistence table (lambda-idempotency-store) small and TTL-managed.
  • Use Powertools’ built-in metrics to track deduplication rates.
  • Transactions cost slightly more, but they eliminate partial writes completely.
  • Pair this design with CloudWatch alarms on duplicate skip counts for proactive reliability.
  • Handle the IdempotencyAlreadyInProgressError gracefully. This occurs if the same event is retried while an in-flight execution is still running. In S3-triggered Lambdas, it’s rare but worth catching in CloudWatch metrics.

With both atomic writes and idempotency now working hand-in-hand, we can step back and appreciate what’s been built.

Conclusion

AWS gives you durability — not perfection.

Retries are a promise, not a problem.

By combining DynamoDB Transactions (atomicity) with Powertools Idempotency (memory), you achieve exactly-once behavior across multiple writes.

In distributed systems, reliability isn’t luck — it’s architecture.


Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite