S3 Bucket Versioning Drift: When Deletes Don’t Really Delete (S3 → Lambda → DynamoDB)

#aws #s3 #devops #cloud

When versioning goes sideways, “deleted” objects keep living rent-free — and sometimes, they bring their friends along for another round of processing.

Problem

You’re running a compact, event-driven data pipeline.

Files land in Amazon S3, a Lambda function processes them, and DynamoDB stores the results for downstream analytics.

It’s simple, automated, and usually bulletproof.

One afternoon, you clean up a few stale test files:

aws s3 rm s3://versioned-demo-bucket/report.csv

✅ The CLI responds:

delete: s3://versioned-demo-bucket/report.csv

You confirm with a quick list — the file’s gone.

Everything looks green in the console, no alarms, no failures.

A week later, storage charges haven’t dropped and your DynamoDB table shows duplicate entries.

Somewhere, that file still exists — and it’s quietly re-triggering your pipeline.

Clarifying the Issue

This isn’t a permissions error or a bug in Lambda — it’s a design behavior of the S3 versioning system.

When a bucket has versioning enabled, deleting an object doesn’t erase its data.

Instead, S3 writes a small placeholder called a delete marker — a new version that hides the original file without actually removing it.

From the console’s point of view, the object is gone.

From S3’s internal perspective, it’s still stored, billable, and potentially re-replicable.

That gap between appearance and reality is where drift begins:

Storage keeps growing though nothing seems to change.
Lambda can re-fire if an old version resurfaces through sync or replication.
DynamoDB accumulates duplicates from those ghost replays.

On paper, everything succeeded — in practice, your system quietly diverged from reality.

This post shows how to surface that drift, prove it, and eliminate it for good.

Why It Matters

Zombie versions cost more than storage — they corrupt data pipelines.

Your application may see duplicates, resurrected payloads, or outdated analytics.

And since S3 treats this as normal behavior, AWS won’t flag it as an error.

In cloud pipelines, idempotency isn’t just a buzzword — it’s the difference between consistency and chaos.

Key Terms

Delete Marker — Zero-byte version that hides an object but doesn’t erase older data.
Version ID — Unique identifier for each stored object revision.
Lifecycle Rule — Policy that transitions or expires objects and versions automatically.
Replication — Cross-region copy that includes delete markers and old versions.
Idempotency — Guarantee that reprocessing an event doesn’t change the final result.

Steps at a Glance

Create versioned infrastructure — S3, DynamoDB, and a small Lambda.
Upload and process a test file end-to-end.
Delete the file and observe how the old version persists.
Trigger replication drift and watch the Lambda fire again.
Apply lifecycle rules and permissions to prevent ghost triggers.

Step 1 – Create Versioned Infrastructure

We’ll build a minimal pipeline: an S3 bucket with versioning, a Lambda that writes to DynamoDB, and permissions to connect them.

aws s3api create-bucket --bucket versioned-demo-bucket --region us-east-1

✅ Output:

{"Location":"/versioned-demo-bucket"}

Enable versioning:

aws s3api put-bucket-versioning \
  --bucket versioned-demo-bucket \
  --versioning-configuration Status=Enabled
aws s3api get-bucket-versioning --bucket versioned-demo-bucket

✅ Output:

{"Status":"Enabled"}

Create the table:

aws dynamodb create-table \
  --table-name versioned-demo-table \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

✅ Confirm "TableStatus":"ACTIVE" before proceeding.

Run this command to verify the table’s state:

aws dynamodb describe-table \
  --table-name versioned-demo-table \
  --query "Table.TableStatus"

✅ Output:

"ACTIVE"

Why: This concise query filters the response to show only the table’s operational status — quick, readable, and perfect for pipeline validation.

Run this command to verify your DynamoDB table is ready:

aws dynamodb describe-table --table-name versioned-demo-table

Expected output:

{
  "Table": {
    "AttributeDefinitions": [
      {
        "AttributeName": "id",
        "AttributeType": "S"
      }
    ],
    "TableName": "versioned-demo-table",
    "KeySchema": [
      {
        "AttributeName": "id",
        "KeyType": "HASH"
      }
    ],
    "TableStatus": "ACTIVE",
    "CreationDateTime": "2025-10-28T15:42:19.123000-05:00",
    "ProvisionedThroughput": {
      "NumberOfDecreasesToday": 0,
      "ReadCapacityUnits": 0,
      "WriteCapacityUnits": 0
    },
    "TableSizeBytes": 0,
    "ItemCount": 0,
    "TableArn": "arn:aws:dynamodb:us-east-1:123456789012:table/versioned-demo-table",
    "BillingModeSummary": {
      "BillingMode": "PAY_PER_REQUEST",
      "LastUpdateToPayPerRequestDateTime": "2025-10-28T15:42:19.123000-05:00"
    }
  }
}

✅ Success indicator: "TableStatus":"ACTIVE"
❌ If you see: "TableStatus":"CREATING" — wait a few seconds and rerun the command.

Why: A clean, versioned baseline makes drift easy to detect later.

Step 2 – Upload and Verify Processing

Upload a test file and confirm the pipeline runs.

echo "Invoice A" > invoice.txt
aws s3 cp invoice.txt s3://versioned-demo-bucket/

✅ Output:

upload: ./invoice.txt to s3://versioned-demo-bucket/invoice.txt

Check DynamoDB:

aws dynamodb scan --table-name versioned-demo-table

✅ Output:

{
  "Count":1,
  "Items":[{"id":{"S":"invoice.txt"},"status":{"S":"processed"}}]
}

Why: This confirms Lambda is firing and DynamoDB is updating as expected.

Step 3 – Delete and Observe Version Drift

Delete the file, then verify what’s really happening behind the scenes.

aws s3 rm s3://versioned-demo-bucket/invoice.txt
aws s3api list-object-versions --bucket versioned-demo-bucket

Expected drift output:

{
  "Versions":[{"Key":"invoice.txt","VersionId":"v1","Size":10}],
  "DeleteMarkers":[{"Key":"invoice.txt","VersionId":"v2"}]
}

Why: Seeing both a version and a delete marker proves the delete only hid the object. The data still exists.

Step 4 – Trigger a Drift Scenario

Force a resync that makes S3 see the old version as new:

aws s3 sync s3://versioned-demo-bucket s3://versioned-demo-bucket

Then rescan DynamoDB:

aws dynamodb scan --table-name versioned-demo-table

✅ Duplicate evidence

{
  "Count":2,
  "Items":[
    {"id":{"S":"invoice.txt"},"status":{"S":"processed"}},
    {"id":{"S":"invoice.txt"},"status":{"S":"processed"}}
  ]
}

Why: The older version re-triggered Lambda, producing a duplicate record.

Step 5 – Apply Lifecycle and Hardening Fixes

Add a lifecycle rule so S3 automatically deletes non-current versions.

aws s3api put-bucket-lifecycle-configuration \
  --bucket versioned-demo-bucket \
  --lifecycle-configuration '{
    "Rules":[
      {
        "ID":"ExpireOldVersions",
        "Status":"Enabled",
        "NoncurrentVersionExpiration":{"NoncurrentDays":1}
      }
    ]
  }'

Verify configuration:

aws s3api get-bucket-lifecycle-configuration --bucket versioned-demo-bucket

✅ Output:

{"Rules":[{"ID":"ExpireOldVersions","Status":"Enabled"}]}

Why: Lifecycle enforcement removes ghost versions before they can trigger downstream events.

Pro Tips

Use lifecycle rules, not manual deletes, for versioned buckets.
Keep replication cleanup policies symmetrical across regions.
Add idempotency keys to DynamoDB writes to prevent duplicates.
Track storage growth with CloudWatch metrics.
Filter non-latest versions in EventBridge rules.

Conclusion

Versioning is protection until it becomes chaotic.

Those invisible versions can quietly re-ignite your pipelines, duplicating work and inflating costs.

S3 isn’t lying when it says “delete complete” — it just means “I hid it for you.”

Lifecycle policies, idempotent processing, and vigilant monitoring are your cleanup crew.

In AWS, deletion is an intent, not an action — and every ghost version you leave behind will eventually come back for bandwidth.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Search This Blog

Tech-Reader.blog