Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ASG Instance Refresh Rollback Impaired behavior #34189

Open
adeeb-ma opened this issue Oct 31, 2023 · 3 comments
Open

[Bug]: ASG Instance Refresh Rollback Impaired behavior #34189

adeeb-ma opened this issue Oct 31, 2023 · 3 comments
Labels
bug Addresses a defect in current functionality. service/autoscaling Issues and PRs that pertain to the autoscaling service.

Comments

@adeeb-ma
Copy link

Terraform Core Version

1.0.5

AWS Provider Version

5.32.1

Affected Resource(s)

aws_autoscaling_group

Expected Behavior

Applying the Autoscaling group with Instance Refresh enabled (i.e. changing launch template version, for example) would trigger the instance refresh to deploy the new configurations, but shouldn't update the configuration beforehand.

In the AWS console, you would see under the "details" tab the old Launch Template Version, while in the Instance Refresh tag it would run with the target configuration being the newer launch template version.

This would imply that in case the new launch template version (2) fails on healthcheck for a while, it would rollback to the previous version (1).

Actual Behavior

Running apply would update the launch template version for the ASG, AND THEN triggers instance refresh with target configuration being the latest launch template version.

This imposes that should the newer launch template version (2) fail on healthcheck for a while, it would "roll back" to the same newer launch template version (2), making the ASG stuck in a perpetual rollback replacing the unhealthy launch template over and over.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

data "aws_ami" "ami" {
  most_recent = true
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
  owners = ["099720109477"] # Canonical
}

resource "aws_launch_template" "sample" {
  name = "sample-lt"

  image_id                             = data.aws_ami.ami.id
  instance_initiated_shutdown_behavior = "terminate"
  instance_type                        = "t3.small"
# user_data = base64encode("#!/bin/bash\nsudo ifconfig eth0 down") # This is to purposefully fail the EC2 health check, to test out the Instance Refresh Rollback
}

resource "aws_autoscaling_group" "sample" {
  name  = "sample-asg"

  launch_template {
    id      = aws_launch_template.sample.id
    version = aws_launch_template.sample.latest_version
  }
  min_size                  = 1
  max_size                  = 1
  desired_capacity          = 1
  health_check_type         = "EC2" # Can be ELB as well
  health_check_grace_period = 20
  wait_for_capacity_timeout = "5m"

  instance_refresh {
    strategy = "Rolling"
    # Preferences can be anything, as long as auto_rollback = true
    preferences {
      auto_rollback = true
      min_healthy_percentage = 75
      skip_matching          = false
    }
  }
}

Steps to Reproduce

  1. Run terraform apply, it should complete successfully, creating the ASG and LT
  2. Ensure in the AWS console that the ASG is created and its instance is launched
  3. Change the launch template, either the instance type to something else that would purposefully fail the EC2 health check
  4. Run terraform apply again, it should complete successfully again
  5. Ensure in the AWS console that:
    a. The ASG's "Details" tab has changed from Launch Template Version 1 -> 2
    b. The ASG's "Instance Refresh" tab shows it's undergoing the procedure

If the new template is unhealthy, you should also see that the instance refresh would fail after an hour, but the rollback would launch the Newer Launch Template Version (2) forever.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

No

@adeeb-ma adeeb-ma added the bug Addresses a defect in current functionality. label Oct 31, 2023
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@github-actions github-actions bot added service/autoscaling Issues and PRs that pertain to the autoscaling service. service/ec2 Issues and PRs that pertain to the ec2 service. labels Oct 31, 2023
@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Oct 31, 2023
@justinretzolk justinretzolk removed service/ec2 Issues and PRs that pertain to the ec2 service. needs-triage Waiting for first response or review from a maintainer. labels Nov 2, 2023
@arianvp
Copy link
Contributor

arianvp commented Aug 23, 2024

Yeh this is a bug in the provider.

When triggering an instance refresh, Terraform should not modify the Auto Scaling Group before calling StartInstanceRefresh. Instead it should pass

DesiredConfiguration:
  LaunchTemplate:
      LaunchTemplateName: <name>
      Version: <version>

to the StartInstanceRefresh call.

Then the instance refresh will update the ASG to point to the new Launch Template Version IF and only IF the instance refresh succeeded. If it fails; it will use the existing Launch Template Version set on the ASG for the rollback.

If you update the launch template version manually beforehand; this confuses any of the rollback logic.

In short: The terraform provider should never manually update the launch template parameters on the ASG. Instead it should pass DesiredConfiguration to StartInstanceRefresh

@arianvp
Copy link
Contributor

arianvp commented Aug 23, 2024

The same has to happen for changing mixed_instances_policy.

In that case DesiredConfiguration.MixedInstancesPolicy should be set with the fields that need updating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. service/autoscaling Issues and PRs that pertain to the autoscaling service.
Projects
None yet
Development

No branches or pull requests

3 participants