-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue #13538 #14020
Fix for issue #13538 #14020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jhujasonw 👋 Thank you for submitting this. Just a drive-by comment about this -- our typical approach for handling resource serialization needs is to utilize the mutexkv
library (awsMutexKV
in this codebase) keyed against something each would have in common, rather than introducing long retry loops which can be error prone and inconsistent in user experience. In this case, my suggestion would be to use a mutex such as the below in the Create
and Delete
functions before the API calls:
mutexKey := fmt.Sprintf("%s-fargate-profiles", d.Get("cluster_name").(string))
awsMutexKV.Lock(mutexKey)
defer awsMutexKV.Unlock(mutexKey)
It would be great to also introduce a covering acceptance test for this situation to prevent any regressions. Copy-paste-modify of an existing test and test configuration should get us close. Please see the Contributing Guide section on Acceptance Testing for more information.
Hi @bflad, |
Looking over the acceptance testing doc, would it be a good idea to just add a second profile into the test that already exists? The destroy checks would pick up on it and if it errors out on creation of the second resource because of serialization issues, the acceptance test configuration apply would fail. Those are really the only two instances where we care to test serialization and the rest of the test would operate only on the first resource. I'm not trying to be lazy, but I am trying to simplify the tests for the eks fargate profiles and keep them readable. If this isn't the right approach, I can come up with a different plan. |
@jhujasonw in general, we prefer new tests and test configurations to ensure existing configurations work as expected, even if it is more verbose. It also helps future maintainers narrow down issues easier, since the different tests can isolate different behaviors. 👍 |
Changed the serialization method to use mutex locking and gave writing an acceptance test a go. Output from the acceptance test posted to the original comment. |
So I have run into another issue with this. If you are removing multiple profiles that are actually running deployments, they take significantly longer (5 to 15 minutes, each) to destroy and you reach a timeout error on the ones waiting in line to be removed. This PR fixes the issue of serialization, but perhaps bumping the timeout may still be something that needs to be considered? Or is there perhaps another way around that that I am not seeing. |
Can we merge this in? Edge cases can be solved with follow up PR in my opinion. What I can see, EKS Fargate management and lifecycle itself is far from ideal, e.g. idea is great, when implementation is horrible, an I hope the AWS team is going to make it more automation-friendly in the future releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes in a later commit.
Please let me know if you would like me to work on this any further before considering for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, @jhujasonw, this looks good 🚀
Output from acceptance testing:
--- PASS: TestAccAWSEksFargateProfile_basic (1143.84s)
--- PASS: TestAccAWSEksFargateProfile_disappears (1163.66s)
--- PASS: TestAccAWSEksFargateProfile_Multi_Profile (1456.46s)
--- PASS: TestAccAWSEksFargateProfile_Selector_Labels (1446.71s)
--- PASS: TestAccAWSEksFargateProfile_Tags (1185.36s)
defer awsMutexKV.Unlock(mutexKey) | ||
|
||
// Creation of profiles must be serialized, and can take a while, increase the timeout to a long time. | ||
err := resource.Retry(30*time.Minute, func() *resource.RetryError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the mutex handles the timing issue -- going to revert this back to the 2 minute IAM propagation timeout.
This has been released in version 3.16.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Community Note
Closes #13538
Release note for CHANGELOG:
Output from acceptance testing: