[Bug]: aws_autoscaling_policy is often missed from the plan or state #34408

speller · 2023-11-15T06:35:34Z

Terraform Core Version

1.5.0

AWS Provider Version

4.67.0

Affected Resource(s)

aws_autoscaling_policy

Expected Behavior

aws_autoscaling_policy resources are handled properly.

Actual Behavior

I'm getting the following errors randomly:

│ Error: creating Auto Scaling Policy (CPUUtilizationInt): ValidationError: Only one TargetTrackingScaling policy for a given metric specification is allowed.
│ 	status code: 400, request id: e9a2c4c1-bc7d-4aae-b2f7-f6998ed196ee
│ 
│   with module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization_int[0],
│   on .terraform/modules/service-bff.instance/main.tf line 77, in resource "aws_autoscaling_policy" "cpu_utilization_int":
│   77: resource "aws_autoscaling_policy" "cpu_utilization_int" {
│ 
╵

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

I have the following configurations for autoscaling policies:

resource "aws_autoscaling_policy" "requests_count" {
  count = var.as_requests_count > 0 ? 1 : 0
  autoscaling_group_name = aws_autoscaling_group.service.name
  name = "AVGRequestsCount"
  policy_type = "TargetTrackingScaling"
  estimated_instance_warmup = var.instance_warmup_time
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label = local.asg_resource_label
    }
    target_value = var.as_requests_count
  }
}

resource "aws_autoscaling_policy" "requests_count_int" {
  count = var.use_internal_lb ? (var.as_requests_count > 0 ? 1 : 0) : 0
  autoscaling_group_name = aws_autoscaling_group.service.name
  name = "AVGRequestsCountInt"
  policy_type = "TargetTrackingScaling"
  estimated_instance_warmup = var.instance_warmup_time
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label = local.asg_resource_label_int
    }
    target_value = var.as_requests_count
  }
}

resource "aws_autoscaling_policy" "cpu_utilization" {
  count = var.as_cpu_utilization > 0 ? 1 : 0
  autoscaling_group_name = aws_autoscaling_group.service.name
  name = "CPUUtilization"
  policy_type = "TargetTrackingScaling"
  estimated_instance_warmup = var.instance_warmup_time
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = var.as_cpu_utilization
  }
}

resource "aws_autoscaling_policy" "cpu_utilization_int" {
  count = var.use_internal_lb ? (var.as_cpu_utilization > 0 ? 1 : 0) : 0
  autoscaling_group_name = aws_autoscaling_group.service.name
  name = "CPUUtilizationInt"
  policy_type = "TargetTrackingScaling"
  estimated_instance_warmup = var.instance_warmup_time
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = var.as_cpu_utilization
  }
}

Steps to Reproduce

Unknown

Debug Output

No response

Panic Output

No response

Important Factoids

The issue happens randomly when I'm updating an existing configuration. During the plan, it looks like it misses one of the policies from the plan and doesn't refresh it:

module.service-bff.module.instance.aws_autoscaling_policy.requests_count_int[0]: Refreshing state... [id=AVGRequestsCountInt]
module.service-bff.module.instance.aws_autoscaling_policy.requests_count[0]: Refreshing state... [id=AVGRequestsCount]
module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization[0]: Refreshing state... [id=CPUUtilization]

Then, it plans to add the "missing" policy:

  # module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization_int[0] will be created
  + resource "aws_autoscaling_policy" "cpu_utilization_int" {
      + arn                       = (known after apply)
      + autoscaling_group_name    = "rev_f_9335_bf-1-2023111503403646740000001e"
      + enabled                   = true
      + estimated_instance_warmup = 120
      + id                        = (known after apply)
      + metric_aggregation_type   = (known after apply)
      + name                      = "CPUUtilizationInt"
      + policy_type               = "TargetTrackingScaling"
      + target_tracking_configuration {
          + disable_scale_in = false
          + target_value     = 70
          + predefined_metric_specification {
              + predefined_metric_type = "ASGAverageCPUUtilization"
            }
        }
    }

And on the apply, it fails with the message above because the policy already exists (see above).

I don't do manual changes to the infrastructure. And the issue is happening randomly. Sometimes everything is fine, sometimes not on the same configuration.

How to fix it?

References

No response

Would you like to implement a fix?

None

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-15T06:35:45Z

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

speller · 2023-11-15T07:03:40Z

I've checked one of the recent failing deployments, and found that the issue happens during the initial resource creation. Below are pieces of logs from the work on an empty state (completely new deployment).

Everything is looking good on the plan:

# module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization[0]will be created
  resource "aws_autoscaling_policy" "cpu_utilization" {
       arn                       = (known after apply)
       autoscaling_group_name    = (known after apply)
       enabled                   = true
       estimated_instance_warmup = 120
       id                        = (known after apply)
       metric_aggregation_type   = (known after apply)
       name                      = "CPUUtilization"
       policy_type               = "TargetTrackingScaling"

       target_tracking_configuration {
           disable_scale_in = false
           target_value     = 70

           predefined_metric_specification {
               predefined_metric_type = "ASGAverageCPUUtilization"
            }
        }
    }

# module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization_int[0]will be created
  resource "aws_autoscaling_policy" "cpu_utilization_int" {
       arn                       = (known after apply)
       autoscaling_group_name    = (known after apply)
       enabled                   = true
       estimated_instance_warmup = 120
       id                        = (known after apply)
       metric_aggregation_type   = (known after apply)
       name                      = "CPUUtilizationInt"
       policy_type               = "TargetTrackingScaling"

       target_tracking_configuration {
           disable_scale_in = false
           target_value     = 70

           predefined_metric_specification {
               predefined_metric_type = "ASGAverageCPUUtilization"
            }
        }
    }

# module.service-bff.module.instance.aws_autoscaling_policy.requests_count[0]will be created
  resource "aws_autoscaling_policy" "requests_count" {
       arn                       = (known after apply)
       autoscaling_group_name    = (known after apply)
       enabled                   = true
       estimated_instance_warmup = 120
       id                        = (known after apply)
       metric_aggregation_type   = (known after apply)
       name                      = "AVGRequestsCount"
       policy_type               = "TargetTrackingScaling"

       target_tracking_configuration {
           disable_scale_in = false
           target_value     = 21

           predefined_metric_specification {
               predefined_metric_type = "ALBRequestCountPerTarget"
               resource_label         = (known after apply)
            }
        }
    }

# module.service-bff.module.instance.aws_autoscaling_policy.requests_count_int[0]will be created
  resource "aws_autoscaling_policy" "requests_count_int" {
       arn                       = (known after apply)
       autoscaling_group_name    = (known after apply)
       enabled                   = true
       estimated_instance_warmup = 120
       id                        = (known after apply)
       metric_aggregation_type   = (known after apply)
       name                      = "AVGRequestsCountInt"
       policy_type               = "TargetTrackingScaling"

       target_tracking_configuration {
           disable_scale_in = false
           target_value     = 21

           predefined_metric_specification {
               predefined_metric_type = "ALBRequestCountPerTarget"
               resource_label         = (known after apply)
            }
        }
    }

In the application, it looks ok in the beginning:

module.service-bff.module.instance.aws_autoscaling_policy.requests_count_int[0]: Creating...
module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization_int[0]: Creating...
module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization[0]: Creating...
module.service-bff.module.instance.aws_autoscaling_policy.requests_count[0]: Creating...

But then, only three are reporting the success:

module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization[0]: Creation complete after 1s [id=CPUUtilization]
module.service-bff.module.instance.aws_autoscaling_policy.requests_count[0]: Creation complete after 1s [id=AVGRequestsCount]
module.service-bff.module.instance.aws_autoscaling_policy.requests_count_int[0]: Creation complete after 1s [id=AVGRequestsCountInt]

And eventually the following error is thrown:

│ Error: creating Auto Scaling Policy (CPUUtilizationInt): ValidationError: Only one TargetTrackingScaling policy for a given metric specification is allowed.
│ 	status code: 400, request id: 26e35a65-648d-4ac8-a45e-d93f85ab2579
│ 
│   with module.service-bff.module.instance.aws_autoscaling_policy.cpu_utilization_int[0],
│   on .terraform/modules/service-bff.instance/main.tf line 77, in resource "aws_autoscaling_policy" "cpu_utilization_int":
│   77: resource "aws_autoscaling_policy" "cpu_utilization_int" {
│

Here is the issue - somewhere the target policy is created twice and this makes TF failing on already existing resource. Nobody except TF was able to create this resource because there was nothing.

When I check the problematic autoscaling group, I can see three policies created. I delete them all and on the next run TF succeeds.

speller · 2023-11-17T04:24:59Z

Probably, this is a problem in our configuration but there's a bug on the AWS side that allows creating two ASGAverageCPUUtilization policies under some circumstances... Our autoscaling groups are targeted by two load balancers, and for ALBRequestCountPerTarget it is okay to have two policies. But ASGAverageCPUUtilization is not dependent on load balancers so it should be only one but I copypasted the config and accidentally duplicated ASGAverageCPUUtilization. And somehow it worked for a long time already in the majority of cases.

speller · 2023-11-17T04:45:58Z

Here is an example of a working case when all four are created:

speller added the bug Addresses a defect in current functionality. label Nov 15, 2023

github-actions bot added the service/autoscaling Issues and PRs that pertain to the autoscaling service. label Nov 15, 2023

terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Nov 15, 2023

justinretzolk removed the needs-triage Waiting for first response or review from a maintainer. label Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: aws_autoscaling_policy is often missed from the plan or state #34408

[Bug]: aws_autoscaling_policy is often missed from the plan or state #34408

speller commented Nov 15, 2023

github-actions bot commented Nov 15, 2023

speller commented Nov 15, 2023 •

edited

Loading

speller commented Nov 17, 2023

speller commented Nov 17, 2023

[Bug]: aws_autoscaling_policy is often missed from the plan or state #34408

[Bug]: aws_autoscaling_policy is often missed from the plan or state #34408

Comments

speller commented Nov 15, 2023

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

github-actions bot commented Nov 15, 2023

Community Note

speller commented Nov 15, 2023 • edited Loading

speller commented Nov 17, 2023

speller commented Nov 17, 2023

speller commented Nov 15, 2023 •

edited

Loading