Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wait for ASG capacity when increasing min size #12018

Merged
merged 2 commits into from
Nov 24, 2021
Merged

Fix wait for ASG capacity when increasing min size #12018

merged 2 commits into from
Nov 24, 2021

Conversation

lachlancooper
Copy link
Contributor

When updating an ASG there are currently two ways capacitySatisfiedUpdate() might ultimately be called from resourceAwsAutoscalingGroupUpdate():

  • if d.HasChange("min_size"), and/or
  • if d.HasChange("desired_capacity")

The existing approach here, only checking against desired_capacity, is flawed because that argument is optional while min_size is mandatory. If desired_capacity is not specified, AWS will automatically update it to match either min_size or max_size depending on whether the group is being increased or decreased.

However, the Terraform AWS provider does not predict this behaviour in advance. The new auto-updated value for desired_capacity is only detected at the very end of resourceAwsAutoscalingGroupUpdate() (after waiting for capacity) when the configuration of the updated ASG is read. Before then d.Get("desired_capacity") returns the existing value.

All of this presents a problem when desired_capacity is not specified and min_size is increased. We detect a change to min_size and thus trigger a waitForASGCapacity(), but then wait for exactly (existing) desired_capacity healthy instances. This number is less than both min_size and the new desired_capacity, meaning that even if the update is otherwise successful, we get stuck forever waiting for a smaller number of healthy instances.

The fix is to wait for whichever is the higher of min_size and desired_capacity instances.

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Closes #5241

Release note for CHANGELOG:

resource/aws_autoscaling_group: Prevent infinite wait for capacity when increasing `min_size` and not specifying `desired_capacity`

Note that I have not run acceptance tests as I do not have a suitable AWS test account configured at the moment.

@lachlancooper lachlancooper requested a review from a team February 13, 2020 02:54
@ghost ghost added needs-triage Waiting for first response or review from a maintainer. size/S Managed by automation to categorize the size of a PR. service/autoscaling Issues and PRs that pertain to the autoscaling service. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. labels Feb 13, 2020
@lachlancooper lachlancooper requested a review from a team as a code owner January 21, 2021 05:47
Base automatically changed from master to main January 23, 2021 00:57
@breathingdust breathingdust added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Oct 6, 2021
@zhelding
Copy link
Contributor

Pull request #21306 has significantly refactored the AWS Provider codebase. As a result, most PRs opened prior to the refactor now have merge conflicts that must be resolved before proceeding.

Specifically, PR #21306 relocated the code for all AWS resources and data sources from a single aws directory to a large number of separate directories in internal/service, each corresponding to a particular AWS service. This separation of code has also allowed for us to simplify the names of underlying functions -- while still avoiding namespace collisions.

We recognize that many pull requests have been open for some time without yet being addressed by our maintainers. Therefore, we want to make it clear that resolving these conflicts in no way affects the prioritization of a particular pull request. Once a pull request has been prioritized for review, the necessary changes will be made by a maintainer -- either directly or in collaboration with the pull request author.

For a more complete description of this refactor, including examples of how old filepaths and function names correspond to their new counterparts: please refer to issue #20000.

For a quick guide on how to amend your pull request to resolve the merge conflicts resulting from this refactor and bring it in line with our new code patterns: please refer to our Service Package Refactor Pull Request Guide.

lachlancooper and others added 2 commits November 24, 2021 12:17
Fixes #5241.

When updating an ASG there are currently two ways
`capacitySatisfiedUpdate()` might ultimately be called from
`resourceAwsAutoscalingGroupUpdate()`:

 - `if d.HasChange("min_size")`, and/or
 - `if d.HasChange("desired_capacity")`

The existing approach here, only checking against `desired_capacity`,
is flawed because that argument is optional while `min_size` is
mandatory. If `desired_capacity` is not specified, AWS will
automatically [update it](https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_UpdateAutoScalingGroup.html)
to match either `min_size` or `max_size` depending on whether the group
is being increased or decreased.

However, the Terraform AWS provider does not predict this behaviour in
advance. The new auto-updated value for `desired_capacity` is only
detected at the very end of `resourceAwsAutoscalingGroupUpdate()`
(after waiting for capacity) when the configuration of the updated ASG
is read. Before then `d.Get("desired_capacity")` returns the existing
value.

All of this presents a problem when `desired_capacity` is not specified
and `min_size` is increased. We detect a change to `min_size` and thus
trigger a `waitForASGCapacity()`, but then wait for exactly (existing)
`desired_capacity` healthy instances. This number is less than both
`min_size` and the new `desired_capacity`, meaning that even if the
update is successful, we get stuck forever waiting for a smaller number
of instances.

The fix is to wait for whichever is the higher of `min_size` and
`desired_capacity` instances.
@YakDriver YakDriver self-assigned this Nov 24, 2021
Copy link
Member

@YakDriver YakDriver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🎉

Output from acceptance tests (us-west-2):

% make testacc TESTS=TestAccAutoScalingGroup PKG=autoscaling
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/autoscaling/... -v -count 1 -parallel 20 -run='TestAccAutoScalingGroup' -timeout 180m
--- PASS: TestAccAutoScalingGroupsDataSource_basic (71.16s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_start (284.93s)
--- PASS: TestAccAutoScalingGroup_ALB_targetGroups (162.70s)
--- PASS: TestAccAutoScalingGroup_ALBTargetGroups_elbCapacity (322.19s)
--- PASS: TestAccAutoScalingGroup_basic (234.58s)
--- PASS: TestAccAutoScalingGroup_classicVPCZoneIdentifier (83.46s)
--- PASS: TestAccAutoScalingGroup_enablingMetrics (202.32s)
--- PASS: TestAccAutoScalingGroup_initialLifecycleHook (196.54s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_basic (238.44s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_triggers (206.31s)
--- PASS: TestAccAutoScalingGroup_launchTemplate (72.24s)
--- PASS: TestAccAutoScalingGroup_LaunchTemplate_iamInstanceProfile (127.48s)
--- PASS: TestAccAutoScalingGroup_LaunchTemplate_update (194.36s)
--- PASS: TestAccAutoScalingGroup_launchTempPartitionNum (82.09s)
--- PASS: TestAccAutoScalingGroup_loadBalancers (400.22s)
--- PASS: TestAccAutoScalingGroup_maxInstanceLifetime (85.33s)
--- PASS: TestAccAutoScalingGroup_mixedInstancesPolicy (56.05s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicy_capacityRebalance (85.24s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_onDemandAllocationStrategy (60.08s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_onDemandBaseCapacity (131.11s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_onDemandPercentageAboveBaseCapacity (67.82s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_spotAllocationStrategy (60.82s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_spotInstancePools (84.57s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_spotMaxPrice (114.92s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_updateToZeroOnDemandBaseCapacity (113.89s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateLaunchTemplateSpecification_launchTemplateName (46.59s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateLaunchTemplateSpecification_version (98.87s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateOverride_instanceType (102.43s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateOverride_instanceTypeWithLaunchTemplateSpecification (64.65s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateOverride_weightedCapacity (146.10s)
--- PASS: TestAccAutoScalingGroup_Name_generated (50.43s)
--- PASS: TestAccAutoScalingGroup_namePrefix (52.38s)
--- PASS: TestAccAutoScalingGroup_serviceLinkedRoleARN (50.94s)
--- PASS: TestAccAutoScalingGroup_suspendingProcesses (238.25s)
--- PASS: TestAccAutoScalingGroup_tags (227.64s)
--- PASS: TestAccAutoScalingGroup_targetGroupARNs (190.20s)
--- PASS: TestAccAutoScalingGroup_terminationPolicies (106.45s)
--- PASS: TestAccAutoScalingGroup_vpcUpdates (84.47s)
--- PASS: TestAccAutoScalingGroup_warmPool (525.27s)
--- PASS: TestAccAutoScalingGroup_withLoadBalancer (308.69s)
--- PASS: TestAccAutoScalingGroup_WithLoadBalancer_toTargetGroup (439.54s)
--- PASS: TestAccAutoScalingGroup_withMetrics (118.14s)
--- PASS: TestAccAutoScalingGroup_withPlacementGroup (131.92s)
--- PASS: TestAccAutoScalingGroupDataSource_basic (64.23s)
--- PASS: TestAccAutoScalingGroupDataSource_launchTemplate (33.80s)
--- PASS: TestAccAutoScalingGroupsDataSource_basic (164.19s)
--- PASS: TestAccAutoScalingGroupTag_basic (39.89s)
--- PASS: TestAccAutoScalingGroupTag_disappears (75.42s)
--- PASS: TestAccAutoScalingGroupTag_value (73.48s)

Output from acceptance tests (GovCloud):

% make testacc TESTS=TestAccAutoScalingGroup PKG=autoscaling
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/autoscaling/... -v -count 1 -parallel 20 -run='TestAccAutoScalingGroup' -timeout 180m
--- PASS: TestAccAutoScalingGroupsDataSource_basic (72.08s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_start (229.06s)
--- PASS: TestAccAutoScalingGroup_ALB_targetGroups (207.82s)
--- PASS: TestAccAutoScalingGroup_ALBTargetGroups_elbCapacity (317.34s)
--- PASS: TestAccAutoScalingGroup_basic (243.91s)
--- PASS: TestAccAutoScalingGroup_classicVPCZoneIdentifier (65.52s)
--- PASS: TestAccAutoScalingGroup_enablingMetrics (220.91s)
--- PASS: TestAccAutoScalingGroup_initialLifecycleHook (346.24s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_basic (245.19s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_start (259.00s)
--- PASS: TestAccAutoScalingGroup_InstanceRefresh_triggers (174.84s)
--- PASS: TestAccAutoScalingGroup_launchTemplate (60.64s)
--- PASS: TestAccAutoScalingGroup_LaunchTemplate_iamInstanceProfile (64.63s)
--- PASS: TestAccAutoScalingGroup_LaunchTemplate_update (151.44s)
--- PASS: TestAccAutoScalingGroup_launchTempPartitionNum (74.81s)
--- PASS: TestAccAutoScalingGroup_loadBalancers (332.22s)
--- PASS: TestAccAutoScalingGroup_maxInstanceLifetime (125.95s)
--- PASS: TestAccAutoScalingGroup_mixedInstancesPolicy (52.69s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicy_capacityRebalance (41.83s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_onDemandAllocationStrategy (56.02s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_onDemandBaseCapacity (103.80s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_onDemandPercentageAboveBaseCapacity (82.06s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_spotAllocationStrategy (51.29s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_spotInstancePools (78.92s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_spotMaxPrice (128.98s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyInstancesDistribution_updateToZeroOnDemandBaseCapacity (90.15s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateLaunchTemplateSpecification_launchTemplateName (81.97s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateLaunchTemplateSpecification_version (83.57s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateOverride_instanceType (78.10s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateOverride_instanceTypeWithLaunchTemplateSpecification (63.75s)
--- PASS: TestAccAutoScalingGroup_MixedInstancesPolicyLaunchTemplateOverride_weightedCapacity (193.86s)
--- PASS: TestAccAutoScalingGroup_Name_generated (45.74s)
--- PASS: TestAccAutoScalingGroup_namePrefix (54.93s)
--- PASS: TestAccAutoScalingGroup_serviceLinkedRoleARN (75.70s)
--- PASS: TestAccAutoScalingGroup_suspendingProcesses (210.98s)
--- PASS: TestAccAutoScalingGroup_tags (228.11s)
--- PASS: TestAccAutoScalingGroup_targetGroupARNs (208.14s)
--- PASS: TestAccAutoScalingGroup_terminationPolicies (129.26s)
--- PASS: TestAccAutoScalingGroup_vpcUpdates (93.51s)
--- PASS: TestAccAutoScalingGroup_warmPool (256.96s)
--- PASS: TestAccAutoScalingGroup_withLoadBalancer (274.62s)
--- PASS: TestAccAutoScalingGroup_WithLoadBalancer_toTargetGroup (337.65s)
--- PASS: TestAccAutoScalingGroup_withMetrics (99.69s)
--- PASS: TestAccAutoScalingGroup_withPlacementGroup (166.94s)
--- PASS: TestAccAutoScalingGroupDataSource_basic (69.17s)
--- PASS: TestAccAutoScalingGroupDataSource_launchTemplate (47.52s)
--- PASS: TestAccAutoScalingGroupTag_basic (62.43s)
--- PASS: TestAccAutoScalingGroupTag_disappears (54.75s)
--- PASS: TestAccAutoScalingGroupTag_value (91.60s)
--- PASS: TestCapacitySatisfiedUpdate (0.00s)

@YakDriver YakDriver merged commit 1e6d287 into hashicorp:main Nov 24, 2021
@github-actions github-actions bot added this to the v3.67.0 milestone Nov 24, 2021
@github-actions
Copy link

github-actions bot commented Dec 1, 2021

This functionality has been released in v3.67.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/autoscaling Issues and PRs that pertain to the autoscaling service. size/S Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws_autoscaling_group fails when increasing number of instances
4 participants