S3 Bucket Versioning Drift: When Deletes Don’t Really Delete (S3 → Lambda → DynamoDB)
When versioning goes sideways, “deleted” objects keep living rent-free — and sometimes, they bring their friends along for another round of processing.
Problem
You’re running a compact, event-driven data pipeline.
Files land in Amazon S3, a Lambda function processes them, and DynamoDB stores the results for downstream analytics.
It’s simple, automated, and usually bulletproof.
One afternoon, you clean up a few stale test files:
aws s3 rm s3://versioned-demo-bucket/report.csv
✅ The CLI responds:
delete: s3://versioned-demo-bucket/report.csv
You confirm with a quick list — the file’s gone.
Everything looks green in the console, no alarms, no failures.
A week later, storage charges haven’t dropped and your DynamoDB table shows duplicate entries.
Somewhere, that file still exists — and it’s quietly re-triggering your pipeline.
Clarifying the Issue
This isn’t a permissions error or a bug in Lambda — it’s a design behavior of the S3 versioning system.
When a bucket has versioning enabled, deleting an object doesn’t erase its data.
Instead, S3 writes a small placeholder called a delete marker — a new version that hides the original file without actually removing it.
From the console’s point of view, the object is gone.
From S3’s internal perspective, it’s still stored, billable, and potentially re-replicable.
That gap between appearance and reality is where drift begins:
- Storage keeps growing though nothing seems to change.
- Lambda can re-fire if an old version resurfaces through sync or replication.
- DynamoDB accumulates duplicates from those ghost replays.
On paper, everything succeeded — in practice, your system quietly diverged from reality.
This post shows how to surface that drift, prove it, and eliminate it for good.
Why It Matters
Zombie versions cost more than storage — they corrupt data pipelines.
Your application may see duplicates, resurrected payloads, or outdated analytics.
And since S3 treats this as normal behavior, AWS won’t flag it as an error.
In cloud pipelines, idempotency isn’t just a buzzword — it’s the difference between consistency and chaos.
Key Terms
- Delete Marker — Zero-byte version that hides an object but doesn’t erase older data.
- Version ID — Unique identifier for each stored object revision.
- Lifecycle Rule — Policy that transitions or expires objects and versions automatically.
- Replication — Cross-region copy that includes delete markers and old versions.
- Idempotency — Guarantee that reprocessing an event doesn’t change the final result.
Steps at a Glance
- Create versioned infrastructure — S3, DynamoDB, and a small Lambda.
- Upload and process a test file end-to-end.
- Delete the file and observe how the old version persists.
- Trigger replication drift and watch the Lambda fire again.
- Apply lifecycle rules and permissions to prevent ghost triggers.
Step 1 – Create Versioned Infrastructure
We’ll build a minimal pipeline: an S3 bucket with versioning, a Lambda that writes to DynamoDB, and permissions to connect them.
aws s3api create-bucket --bucket versioned-demo-bucket --region us-east-1
✅ Output:
{"Location":"/versioned-demo-bucket"}
Enable versioning:
aws s3api put-bucket-versioning \
  --bucket versioned-demo-bucket \
  --versioning-configuration Status=Enabled
aws s3api get-bucket-versioning --bucket versioned-demo-bucket
✅ Output:
{"Status":"Enabled"}
Create the table:
aws dynamodb create-table \
  --table-name versioned-demo-table \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST
✅ Confirm "TableStatus":"ACTIVE" before proceeding.
Run this command to verify the table’s state:
aws dynamodb describe-table \
  --table-name versioned-demo-table \
  --query "Table.TableStatus"
✅ Output:
"ACTIVE"
Why: This concise query filters the response to show only the table’s operational status — quick, readable, and perfect for pipeline validation.
Run this command to verify your DynamoDB table is ready:
aws dynamodb describe-table --table-name versioned-demo-table
Expected output:
{
  "Table": {
    "AttributeDefinitions": [
      {
        "AttributeName": "id",
        "AttributeType": "S"
      }
    ],
    "TableName": "versioned-demo-table",
    "KeySchema": [
      {
        "AttributeName": "id",
        "KeyType": "HASH"
      }
    ],
    "TableStatus": "ACTIVE",
    "CreationDateTime": "2025-10-28T15:42:19.123000-05:00",
    "ProvisionedThroughput": {
      "NumberOfDecreasesToday": 0,
      "ReadCapacityUnits": 0,
      "WriteCapacityUnits": 0
    },
    "TableSizeBytes": 0,
    "ItemCount": 0,
    "TableArn": "arn:aws:dynamodb:us-east-1:123456789012:table/versioned-demo-table",
    "BillingModeSummary": {
      "BillingMode": "PAY_PER_REQUEST",
      "LastUpdateToPayPerRequestDateTime": "2025-10-28T15:42:19.123000-05:00"
    }
  }
}
✅ Success indicator: "TableStatus":"ACTIVE"
❌ If you see: "TableStatus":"CREATING" — wait a few seconds and rerun the command.
Why: A clean, versioned baseline makes drift easy to detect later.
Step 2 – Upload and Verify Processing
Upload a test file and confirm the pipeline runs.
echo "Invoice A" > invoice.txt
aws s3 cp invoice.txt s3://versioned-demo-bucket/
✅ Output:
upload: ./invoice.txt to s3://versioned-demo-bucket/invoice.txt
Check DynamoDB:
aws dynamodb scan --table-name versioned-demo-table
✅ Output:
{
  "Count":1,
  "Items":[{"id":{"S":"invoice.txt"},"status":{"S":"processed"}}]
}
Why: This confirms Lambda is firing and DynamoDB is updating as expected.
Step 3 – Delete and Observe Version Drift
Delete the file, then verify what’s really happening behind the scenes.
aws s3 rm s3://versioned-demo-bucket/invoice.txt
aws s3api list-object-versions --bucket versioned-demo-bucket
Expected drift output:
{
  "Versions":[{"Key":"invoice.txt","VersionId":"v1","Size":10}],
  "DeleteMarkers":[{"Key":"invoice.txt","VersionId":"v2"}]
}
Why: Seeing both a version and a delete marker proves the delete only hid the object. The data still exists.
Step 4 – Trigger a Drift Scenario
Force a resync that makes S3 see the old version as new:
aws s3 sync s3://versioned-demo-bucket s3://versioned-demo-bucket
Then rescan DynamoDB:
aws dynamodb scan --table-name versioned-demo-table
✅ Duplicate evidence
{
  "Count":2,
  "Items":[
    {"id":{"S":"invoice.txt"},"status":{"S":"processed"}},
    {"id":{"S":"invoice.txt"},"status":{"S":"processed"}}
  ]
}
Why: The older version re-triggered Lambda, producing a duplicate record.
Step 5 – Apply Lifecycle and Hardening Fixes
Add a lifecycle rule so S3 automatically deletes non-current versions.
aws s3api put-bucket-lifecycle-configuration \
  --bucket versioned-demo-bucket \
  --lifecycle-configuration '{
    "Rules":[
      {
        "ID":"ExpireOldVersions",
        "Status":"Enabled",
        "NoncurrentVersionExpiration":{"NoncurrentDays":1}
      }
    ]
  }'
Verify configuration:
aws s3api get-bucket-lifecycle-configuration --bucket versioned-demo-bucket
✅ Output:
{"Rules":[{"ID":"ExpireOldVersions","Status":"Enabled"}]}
Why: Lifecycle enforcement removes ghost versions before they can trigger downstream events.
Pro Tips
- Use lifecycle rules, not manual deletes, for versioned buckets.
- Keep replication cleanup policies symmetrical across regions.
- Add idempotency keys to DynamoDB writes to prevent duplicates.
- Track storage growth with CloudWatch metrics.
- Filter non-latest versions in EventBridge rules.
Conclusion
Versioning is protection until it becomes chaotic.
Those invisible versions can quietly re-ignite your pipelines, duplicating work and inflating costs.
S3 isn’t lying when it says “delete complete” — it just means “I hid it for you.”
Lifecycle policies, idempotent processing, and vigilant monitoring are your cleanup crew.
In AWS, deletion is an intent, not an action — and every ghost version you leave behind will eventually come back for bandwidth.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
.jpeg)

 
 
 
Comments
Post a Comment