The Secret Life of AWS: S3 Is More Than a Bucket

 

The Secret Life of AWS: S3 Is More Than a Bucket

Versioning, storage classes, lifecycle policies, and the access control you didn't know you needed

#AmazonS3 #CloudStorage #S3Versioning #AWSS3




Margaret is a senior software engineer. Timothy is her junior colleague. They meet in a grand Victorian library in London — and in every episode, they work through the tools, ideas, and infrastructure that power modern software. Today, Timothy returns to a service he thinks he knows.

Episode 8

Timothy settled into his chair with the particular ease of someone revisiting familiar ground.

"S3," he said. "I know this one."

Margaret looked up from her book. The faintest suggestion of something — not quite a smile — crossed her face.

"Tell me what you know," she said.

"Object store, not a file system. Flat namespace — the slash in a key name is just a character, not a folder separator. Buckets as containers, objects as the things inside them. Extraordinarily durable. Effectively limitless storage." He paused. "Individual objects up to five terabytes. The bucket itself has no practical size limit."

"That is all correct," Margaret said.

He waited for the qualifier. It came.

"It is also approximately the first ten percent of what there is to know."

Timothy looked at her. "Ten percent."

"Perhaps fifteen," she said generously. "You understand what S3 is. You do not yet understand what S3 does — or what it can be made to do. Those are different things." She set down her book. "Let's start with the question you haven't thought to ask yet."

"Which is?"

"What happens when you overwrite an object?"

Timothy considered that. "The new version replaces the old one."

"By default," Margaret said. "Yes. The old version is gone. Not archived. Not recoverable. Gone." She looked at him steadily. "Now tell me — when did you last want that behavior?"


Versioning — The History You Didn't Know You Needed

"Versioning," Timothy said. He had the tone of someone who had heard a word without fully examining it.

"Versioning," Margaret confirmed. "An S3 bucket feature that, when enabled, preserves every version of every object. Every upload, every overwrite, every change — kept. Nothing is ever truly deleted. A delete operation places a delete marker on the current version, making the object appear absent — but every previous version remains, retrievable, intact."

"So deletion is reversible."

"Deletion is reversible. Overwrites are reversible. The accidental upload that corrupts a production configuration file — reversible. The script that runs in the wrong environment and overwrites a month of data — reversible, if versioning was enabled before the script ran." She paused on that last phrase. "That condition is everything. Versioning must be enabled before the event that makes you wish you had it. Enabling it afterward recovers nothing that was lost before."

Timothy absorbed that. "It's the kind of thing you set up once and never think about again — until the day you desperately need it."

"Precisely. And on that day, the question is not whether versioning is a good idea. The question is whether you were the kind of engineer who turned it on before they needed it." She looked at him directly. "Versioning has a cost — every version of every object is stored and billed. For buckets with high write volume, the costs compound. Lifecycle policies manage that, and we will come to them." She paused. "But the habit to build is this — when you create a bucket that holds anything you care about, versioning is one of the first things you enable. Not when something goes wrong. Before."

"Default off," Timothy said.

"Default off," Margaret agreed. "As with most things in AWS that protect you."


Storage Classes — The Temperature of Data

"Here is something most developers do not consider when they first use S3," Margaret said. "All data is not equally valuable at all times."

Timothy looked up.

"When you upload an object to S3 without specifying otherwise, it goes into S3 Standard — the default storage class. High availability, high durability, optimized for frequent access. Also the most expensive storage class per gigabyte." She folded her hands. "That is the right choice for data you access regularly. It is the wrong choice for data you access once a month, or once a year, or never — but must retain for compliance reasons."

"There are other storage classes."

"Several. Think of them as a temperature spectrum." She reached for her pen. "S3 Standard at one end — hot data, frequent access, highest cost. S3 Intelligent-Tiering in the middle — AWS monitors access patterns and moves objects between tiers automatically, useful when you genuinely don't know how often data will be accessed." She continued. "S3 Standard-IA — Infrequent Access. Lower storage cost, but a retrieval fee applied each time you access the data. For backups, for disaster recovery files, for data you hope never to need but must have when you do."

"And colder still?"

"S3 Glacier. Long-term archival storage at a fraction of the cost of Standard — but retrieval is not instant. Depending on the retrieval tier you choose, you may wait minutes or hours for your data." She paused. "And S3 Glacier Deep Archive — the coldest tier. The lowest cost per gigabyte in the entire S3 spectrum. Retrieval measured in hours. Designed for data that must be retained for regulatory reasons but may never actually be needed — audit logs, compliance records, historical archives."

Timothy was writing quickly. "So the choice of storage class is really a decision about how you expect to use the data."

"It is a decision about the economics of your data over time," Margaret said. "Data that starts hot often goes cold. The application logs from this morning are something you might query today, probably won't query next month, and almost certainly won't query next year — but may need to retain for two years for compliance. Storing two years of logs in S3 Standard because that was the default is an expensive oversight."

"Which is where lifecycle policies come in."

"Which is exactly where lifecycle policies come in."


Lifecycle Policies — Automating the Journey

"A lifecycle policy," Margaret said, "is a set of rules that automatically transition objects between storage classes, or expire them entirely, based on age."

"So you define the journey once and S3 handles it."

"You define the journey once and S3 handles it. A rule might say — after thirty days, move this object from Standard to Standard-IA. After ninety days, move it to Glacier. After seven years, delete it." She looked at him. "That rule runs automatically, on every object in the bucket that matches the criteria, without any intervention from you. The data starts expensive and hot, moves to cheaper and cooler as it ages, and is eventually removed when retention requirements are satisfied."

"This is the kind of thing that saves significant money at scale."

"At scale it is the difference between a reasonable storage bill and an extraordinary one," Margaret said. "But even at small scale the habit matters — because small scale becomes large scale, and the lifecycle policies you defined early continue to work correctly as the data grows. The ones you didn't define become technical debt measured in storage costs."

He looked at his notes. "Standard, Standard-IA, Intelligent-Tiering, Glacier, Glacier Deep Archive. Lifecycle policies to move between them. Versioning to preserve history." He paused. "And I've been putting everything in Standard and leaving it there."

"As most people do," Margaret said. "Until someone shows them the bill."


Access Control — Three Layers and Their Interactions

"S3 access control," Margaret said, "is an area of genuine complexity. Not artificial complexity — real complexity that exists because the problem itself is complex. Who should be able to read this object? Who should be able to write to this bucket? Should objects be publicly accessible? Should only specific AWS services be permitted?"

"IAM handles this," Timothy said. He was confident here — they had covered IAM in Episode 5.

"IAM is one layer," Margaret said. "But S3 has its own access control mechanisms that interact with IAM in ways that require careful understanding."

Timothy set down his pen.

IAM and Bucket Policies

"The first layer is IAM policies — which you know. An IAM user or role with the appropriate S3 permissions can access the bucket. This is identity-based control — it starts with the identity and asks what it is allowed to do." She paused. "The second layer is bucket policies. A bucket policy is attached to the bucket itself rather than to an identity. It defines who can access this specific bucket, from where, under what conditions — regardless of what their IAM policy says. This is resource-based control — it starts with the resource and asks who is allowed to reach it."

"And the two interact."

"Both must permit the access for it to succeed, in most cases. An IAM policy that grants S3 access and a bucket policy that denies access — the denial wins. A bucket policy that grants access but no IAM policy permits S3 — access is denied." She looked at him. "This is where S3 permission debugging becomes methodical rather than intuitive. When access is denied, you must check both layers."

"You mentioned a third."

ACLs and Block Public Access

"Access Control Lists — ACLs. The oldest S3 access control mechanism, predating bucket policies as a resource-based control. They operate at the object level and the bucket level, granting basic read and write permissions to AWS accounts or predefined groups." She paused. "ACLs are largely superseded by bucket policies for most use cases. AWS has been progressively recommending against them for new implementations. You will encounter them in older setups and must understand them — but for anything you build from scratch, bucket policies and IAM policies are the correct tools."

"So for a new bucket — IAM for identity control, bucket policy for resource control, ACLs avoided unless necessary."

"That is the current best practice, yes." She folded her hands. "And one more thing — Block Public Access. A bucket-level and account-level setting that prevents objects from becoming publicly accessible, regardless of what any policy or ACL says. It is an override, a safety net." She looked at him steadily. "It is on by default for new buckets. Do not turn it off without a specific, considered reason. The history of inadvertent S3 data exposure is long and, for the companies involved, deeply instructive — customer data made publicly accessible because someone misconfigured a bucket, with consequences measured not just in fines but in trust."

Timothy winced. "I've read about those incidents."

"Most developers have," Margaret said. "Block Public Access exists, in part, because of them."


Before Next Time

He gathered his notes — more pages than usual, filled with the particular density of information that kept surprising him by continuing to exist.

"I thought I knew S3," he said.

"You knew the shape of it," Margaret said. "You are beginning to know the substance."

"Versioning. Storage classes. Lifecycle policies. Three layers of access control." He looked at the pages. "And this is only half of it."

"The mechanics," Margaret said. "Next time — what S3 connects to. How it fits into a larger architecture. Why the decisions you make about S3 early determine what your system can do later." She picked up her book. "S3 is not a place you put files. It is a design decision. Next time, you will understand what that means in practice."

He left with more to think about than he had arrived with, which was, he had come to understand, the reliable signature of a session with Margaret.

She turned back to her book, quietly pleased — not with the volume of what had been covered, but with the particular quality of Timothy's final silence. The students who arrived certain and left uncertain in the right way were the ones who would keep learning long after they left the room.


Next episode: S3 — The Architectural Decision. Timothy discovers how S3 connects to the rest of AWS. Event notifications, static hosting, data pipeline foundations, and why S3 is often the most load-bearing service in an architecture that wasn't designed to depend on it.


Aaron Rose is a software engineer and technology writer at tech-reader.blog

Catch up on the latest explainer videos, podcasts, and industry discussions below.


Popular posts from this blog

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

Running AI Models on Raspberry Pi 5 (8GB RAM): What Works and What Doesn't

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison