CloudFormation Stuck in CREATE_FAILED: Taming the WaitCondition Beast




CloudFormation Stuck in CREATE_FAILED: Taming the WaitCondition Beast

Have you ever found yourself staring at a CloudFormation stack stuck in CREATE_FAILED state, specifically with a pesky AWS::CloudFormation::WaitCondition? You're not alone. Many developers, especially when venturing into EC2 instance provisioning via CloudFormation, encounter this roadblock. In this post, we'll dissect the common issues and provide practical solutions, inspired by a real-world scenario.


The Scenario: A CloudFormation Template Gone Awry

Imagine deploying a CloudFormation stack for a Spark History Server, as outlined in the AWS Glue documentation. The template includes an EC2 instance, security groups, IAM roles, and a WaitCondition to ensure the instance's setup completes successfully. However, the stack consistently fails, with the WaitCondition lingering in CREATE_FAILED.


The Root of the Problem: WaitCondition Misuse

The primary culprit is often the misuse of WaitCondition. While it's designed to pause stack creation until a signal is received, it's generally not the best practice for EC2 instance creation. Instead, AWS recommends using the CreationPolicy attribute.


Why CreationPolicy over WaitCondition for EC2?
  • Simplicity and Clarity: CreationPolicy directly attaches to the EC2 instance resource, streamlining the process.
  • Built-in Handling: It's tailored for EC2, handling instance initialization and signaling more effectively.
  • Reduced Complexity: It eliminates the need for a separate WaitHandle and WaitCondition resource, simplifying your template.


The Solution: Using CreationPolicy

Here's how to revamp your template:


1. Modify the EC2 Instance Resource:
  • Add a CreationPolicy attribute to your AWS::EC2::Instance resource.
  • Specify a ResourceSignal with a Timeout (e.g., PT20M for 20 minutes).
  • Adjust the cfn-signal command in your UserData script to target the instance resource itself, not a separate WaitHandle.

yaml
HistoryServerInstance:
  Type: AWS::EC2::Instance
  CreationPolicy:
    ResourceSignal:
      Timeout: PT20M
  Properties:
    # ... other properties ...
    UserData:
      Fn::Base64: !Sub |
        #!/bin/bash -xe
        # ... other commands ...
        /opt/aws/bin/cfn-signal -e $? 
            --stack ${AWS::StackName} 
            --resource HistoryServerInstance 
            --region ${AWS::Region}   


2. Remove WaitHandle and WaitCondition:

Completely remove the AWS::CloudFormation::WaitConditionHandle and AWS::CloudFormation::WaitCondition resources from your template.



3. Ensure Dependencies:

Verify that all resources the EC2 instance relies on are created before it. Use the DependsOn attribute if necessary.



4. Troubleshooting:
  • Dive into the CloudWatch logs for cfn-init.log to pinpoint any errors during instance setup.
  • Check the CloudFormation events for detailed error messages.

Key Takeaways
  • For EC2 instance creation, embrace CreationPolicy over WaitCondition.
  • Ensure your cfn-signal command correctly targets the EC2 resource.
  • Thoroughly review CloudWatch logs and CloudFormation events for debugging.

By implementing these changes, you'll significantly improve the reliability and clarity of your CloudFormation templates, avoiding the dreaded CREATE_FAILED state. Remember, mastering CloudFormation is a journey, and understanding these nuances will empower you to build robust and scalable infrastructure on AWS.


Need AWS Expertise?

If you're looking for guidance on AWS or any cloud challenges, feel free to reach out! We'd love to help you tackle your projects. 🚀


Email us at: info@pacificw.com


Image: Gemini

Comments

Popular posts from this blog

The New ChatGPT Reason Feature: What It Is and Why You Should Use It

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison

The Reasoning Chain in DeepSeek R1: A Glimpse into AI’s Thought Process