Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b-fixed forced destroy of securitylake when meta_store_role_arn changes #36874

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joelmccoy
Copy link

Description

This change updates the aws_securitylake_data_lake logic so that when the attribute meta_store_manager_role_arn is changed it does not require a delete of the entire data lake. The UpdateDataLake recently included a change where you can update the metaStoreManagerRoleArn with this API call. The stringplanmodifier.RequiresReplace is no longer necessary.

Relations

Closes #36831

References

N/A

Output from Acceptance Testing

$  make testacc TESTS=TestAccSecurityLake_ PKG=securitylake
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.21.8 test ./internal/service/securitylake/... -v -count 1 -parallel 20 -run='TestAccSecurityLake_'  -timeout 360m
=== RUN   TestAccSecurityLake_serial
=== PAUSE TestAccSecurityLake_serial
=== CONT  TestAccSecurityLake_serial
=== RUN   TestAccSecurityLake_serial/DataLake
=== RUN   TestAccSecurityLake_serial/DataLake/metaStoreUpdate
=== RUN   TestAccSecurityLake_serial/DataLake/replication
=== RUN   TestAccSecurityLake_serial/DataLake/basic
=== RUN   TestAccSecurityLake_serial/DataLake/disappears
=== RUN   TestAccSecurityLake_serial/DataLake/tags
=== RUN   TestAccSecurityLake_serial/DataLake/lifecycle
=== RUN   TestAccSecurityLake_serial/DataLake/lifecycleUpdate
=== RUN   TestAccSecurityLake_serial/Subscriber
=== RUN   TestAccSecurityLake_serial/Subscriber/basic
=== RUN   TestAccSecurityLake_serial/Subscriber/customLogs
=== RUN   TestAccSecurityLake_serial/Subscriber/disappears
=== RUN   TestAccSecurityLake_serial/Subscriber/tags
=== RUN   TestAccSecurityLake_serial/Subscriber/updated
=== RUN   TestAccSecurityLake_serial/SubscriberNotification
=== RUN   TestAccSecurityLake_serial/SubscriberNotification/https
=== RUN   TestAccSecurityLake_serial/SubscriberNotification/disappears
=== RUN   TestAccSecurityLake_serial/SubscriberNotification/update
=== RUN   TestAccSecurityLake_serial/SubscriberNotification/basic
=== RUN   TestAccSecurityLake_serial/AWSLogSource
=== RUN   TestAccSecurityLake_serial/AWSLogSource/basic
=== RUN   TestAccSecurityLake_serial/AWSLogSource/disappears
=== RUN   TestAccSecurityLake_serial/AWSLogSource/multiRegion
=== RUN   TestAccSecurityLake_serial/CustomLogSource
=== RUN   TestAccSecurityLake_serial/CustomLogSource/basic
=== RUN   TestAccSecurityLake_serial/CustomLogSource/disappears
--- PASS: TestAccSecurityLake_serial (1141.58s)
    --- PASS: TestAccSecurityLake_serial/DataLake (365.31s)
        --- PASS: TestAccSecurityLake_serial/DataLake/metaStoreUpdate (55.85s)
        --- PASS: TestAccSecurityLake_serial/DataLake/replication (80.38s)
        --- PASS: TestAccSecurityLake_serial/DataLake/basic (35.53s)
        --- PASS: TestAccSecurityLake_serial/DataLake/disappears (34.71s)
        --- PASS: TestAccSecurityLake_serial/DataLake/tags (57.93s)
        --- PASS: TestAccSecurityLake_serial/DataLake/lifecycle (40.68s)
        --- PASS: TestAccSecurityLake_serial/DataLake/lifecycleUpdate (60.21s)
    --- PASS: TestAccSecurityLake_serial/Subscriber (265.16s)
        --- PASS: TestAccSecurityLake_serial/Subscriber/basic (43.56s)
        --- PASS: TestAccSecurityLake_serial/Subscriber/customLogs (44.93s)
        --- PASS: TestAccSecurityLake_serial/Subscriber/disappears (40.37s)
        --- PASS: TestAccSecurityLake_serial/Subscriber/tags (76.45s)
        --- PASS: TestAccSecurityLake_serial/Subscriber/updated (59.85s)
    --- PASS: TestAccSecurityLake_serial/SubscriberNotification (275.65s)
        --- PASS: TestAccSecurityLake_serial/SubscriberNotification/https (55.18s)
        --- PASS: TestAccSecurityLake_serial/SubscriberNotification/disappears (56.28s)
        --- PASS: TestAccSecurityLake_serial/SubscriberNotification/update (109.87s)
        --- PASS: TestAccSecurityLake_serial/SubscriberNotification/basic (54.31s)
    --- PASS: TestAccSecurityLake_serial/AWSLogSource (155.59s)
        --- PASS: TestAccSecurityLake_serial/AWSLogSource/basic (41.52s)
        --- PASS: TestAccSecurityLake_serial/AWSLogSource/disappears (39.81s)
        --- PASS: TestAccSecurityLake_serial/AWSLogSource/multiRegion (74.27s)
    --- PASS: TestAccSecurityLake_serial/CustomLogSource (79.87s)
        --- PASS: TestAccSecurityLake_serial/CustomLogSource/basic (39.63s)
        --- PASS: TestAccSecurityLake_serial/CustomLogSource/disappears (40.24s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/securitylake       1146.023s

...

Copy link

Community Note

Voting for Prioritization

  • Please vote on this pull request by adding a 👍 reaction to the original post to help the community and maintainers prioritize this pull request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

For Submitters

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • For new resources and data sources, use skaff to generate scaffolding with comments detailing common expectations.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added size/M Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. service/securitylake Issues and PRs that pertain to the securitylake service. labels Apr 11, 2024
@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Apr 11, 2024
@joelmccoy joelmccoy changed the title fixed forced destroy of securitylake when meta_store_role_arn changes b-fixed forced destroy of securitylake when meta_store_role_arn changes Apr 11, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome @joelmccoy 👋

It looks like this is your first Pull Request submission to the Terraform AWS Provider! If you haven’t already done so please make sure you have checked out our CONTRIBUTOR guide and FAQ to make sure your contribution is adhering to best practice and has all the necessary elements in place for a successful approval.

Also take a look at our FAQ which details how we prioritize Pull Requests for inclusion.

Thanks again, and welcome to the community! 😃

@joelmccoy joelmccoy marked this pull request as ready for review April 11, 2024 21:42
@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Apr 15, 2024
@gdavison gdavison self-assigned this May 3, 2024
@terraform-aws-provider terraform-aws-provider bot added the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label May 3, 2024
Copy link
Contributor

@gdavison gdavison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @joelmccoy. It looks good. I have a few suggestions

Comment on lines +393 to +417
resource "aws_iam_role" "meta_store_manager_v1" {
name = "AmazonSecurityLakeMetaStoreManagerV1"
path = "/service-role/"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowLambda",
"Effect": "Allow",
"Principal": {
"Service": [
"lambda.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "datalake_v1" {
role = aws_iam_role.meta_store_manager_v1.name
policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/service-role/AmazonSecurityLakeMetastoreManager"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this Role is only used in the testAccDataLakeConfig_metaStore and testAccDataLakeConfig_metaStoreUpdate, it would be better to only include it in those configurations, rather than for all tests

Name = %[1]q
}

depends_on = [aws_iam_role.meta_store_manager]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since meta_store_manager_role_arn already refers to aws_iam_role.meta_store_manager, this depends_on is redundant.

Comment on lines +664 to +666
tags = {
Name = %[1]q
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags can be removed, since it's not relevant to this test

@joelmccoy
Copy link
Author

joelmccoy commented May 3, 2024

@gdavison Good feedback. I implemented the changes... but in the process I discovered an issue with the UpdateDataLake api call. When you specify a new role it just adds this new role to the Data Lake Administrators list (in the LakeFormation Service) but it doesn't remove the previous role.

Essentially the SecurityLake API is doing some LakeFormation calls under the hood and it is leaving dangling resources in LakeFormation that can potentially cause some issues.

Here is an example of the flow:

  1. CreateDataLake with role1 (role1 is admin)
  2. UpdateDataLake with role2 (role1 and role2 are both admin)

This becomes an issue because if you delete the IAM role1 or role2 ALL SecurityLake operations will fail (yikes). See below for error that is returned.

image

I discovered this in my tests when i removed the v1 role out of the basic config. It seems to check this Lakeformation admin list and will fail if any of the admin IAM roles do not exist. I think this is also the case for Deleting Security Lake Resource. It seems to leave this dangling Admin Assignment in Lakeformation.

The only way to change these admins I believe is through this LakeFormation PutDataLakeSettings.

A hacky way around this is would be to make some lakeformation calls in the terraform provider to do the cleanup on a meta_store_role_arn change. But that doesn't feel right and I think this should be fixed upstream in the APIs. Agree/disagree?

@gdavison
Copy link
Contributor

gdavison commented May 3, 2024

Good find, @joelmccoy. We do have a number of cases where we need to do some cleanup to "help" AWS services. For example, deleteLingeringENIs at

func deleteLingeringENIs(ctx context.Context, conn *ec2.Client, filterName, resourceId string, timeout time.Duration) error {
needs to clean up network interfaces that sometimes take a long time to be cleaned up automatically.

Do you know if there's a delayed process on the AWS side that eventually removes the role from Lake Formation? If so, we could wait for that. Otherwise, the provider may have to reach across service boundaries.

@joelmccoy
Copy link
Author

joelmccoy commented May 3, 2024

@gdavison It doesn't seem to ever get deleted. If it is a sane approach to add a helper to clean up lingering
Lakeformation admin role assignments I can definitely take a stab at adding that logic.

edit: Also to note, the helper logic would need to GetDataLakeSettings, remove the meta_store_role from the response, and then PutDataLakeSettings. There is no Update call for this operation, which makes it kind of ugly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. service/securitylake Issues and PRs that pertain to the securitylake service. size/M Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: aws_securitylake_data_lake change to meta_store_manager_role_arn triggers recreation
3 participants