The Secret Life of AWS: The Workflow (AWS Step Functions)
Orchestration vs. Choreography. How to use AWS Step Functions to manage complex workflows.
Part 21 of The Secret Life of AWS
Timothy was staring at a Python script that was scrolling endlessly on his screen.
"What are you building?" Margaret asked, pausing by his desk.
"I am writing the Return Process," Timothy said. "It is complicated. First, I have to credit the user's wallet. Then, I have to update the inventory. Then, I need to wait 24 hours to send a 'We're Sorry' email."
Margaret leaned in closer to read the code. She pointed to line 45.
time.sleep(86400) # Wait 24 hours
"Timothy," she asked gently. "Where is this code running?"
"On AWS Lambda," he replied.
"Lambda functions have a maximum execution time of 15 minutes," she reminded him. "If you tell this function to sleep for 24 hours, it will time out and crash. The email will never be sent. And worse—you won't know if the inventory update happened or not."
Timothy slumped. "So I have to manage the state in a database? I have to write a cron job to check every hour if 24 hours have passed?"
"You could," Margaret said. "Or you could stop writing boilerplate code to manage state."
"We need an Orchestrator," she said. "We need AWS Step Functions."
The State Machine
Margaret navigated to the Step Functions console.
"In Episode 20, we used EventBridge for Choreography—where services react independently," she explained. "But for a Return Process, the order matters. Step 1 must succeed before Step 2 begins. We need strict Orchestration."
"Step Functions allows us to build a State Machine," she continued. "It is a visual workflow that coordinates your services."
She opened the Workflow Studio, a drag-and-drop interface.
The States
"Instead of writing one giant Lambda function," Margaret explained, "we break the logic into small, discrete States."
She dragged a green box onto the canvas. Task State.
"First, we call the 'Credit Wallet' Lambda."
She dragged an orange box next. Choice State.
"Here we add logic. If the credit fails, go to a 'Fail' state. If it succeeds, move forward."
She dragged a blue box. Wait State.
"This is what you were trying to do in Python," she said. "We configure this state to wait for 86,400 seconds."
"Does that cost money?" Timothy asked.
"No," Margaret said. "Unlike a sleeping Lambda function, a Step Function Wait State costs nothing while it waits. It effectively pauses the workflow and wakes it up exactly when needed."
Visualizing the Logic
Timothy looked at the diagram on the screen. It looked like a flowchart, but it was actually executable code.
- Start
- Task: Credit Wallet
- Choice: Success?
- Task: Update Inventory
- Wait: 24 Hours
- Task: Send Email
- End
"This visual diagram isn't just documentation," Margaret noted. "It generates Amazon States Language (ASL), a JSON-based definition. This diagram is your code."
"And if we needed to do two things at once?" Timothy asked.
"We would use a Parallel State," she said. "We could update the inventory and notify the warehouse simultaneously, then wait for both to finish before moving on."
Error Handling (Retries)
"What if the Inventory database is offline?" Timothy asked. "My Python script had a try/except block to retry the connection."
Margaret clicked on the 'Update Inventory' task in the visual editor. She opened the Error Handling tab.
"We can define Retries here, in the workflow definition," she said. "We can tell it: 'If you get a ConnectionError, wait 2 seconds and try again. Then back off for 4 seconds. Try up to 3 times.'"
"You are removing code from my application," Timothy realized.
"I am removing infrastructure code," Margaret corrected. "Your Lambda function should only care about updating inventory. It should not care about retries, waiting, or what happens next. The State Machine handles the flow."
The Lesson
Timothy deleted his massive Python script. He replaced it with three tiny, simple Lambda functions: one for credit, one for inventory, one for email. He wired them together in the Workflow Studio.
"It works," Timothy said, watching the green path light up on the screen as a test execution ran.
"And look at the Execution History," Margaret pointed out. "Step Functions keeps a record of every run for 90 days. If a return fails next week, you can come here and see exactly which step broke and why."
"This is Serverless Orchestration," she smiled. "We write less code, but we build more complex systems."
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
.jpeg)

Comments
Post a Comment