Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cannot configure PUBLIC schema of new database #2826

Closed
1 task
wesleyhillyext opened this issue May 20, 2024 · 14 comments
Closed
1 task

[Bug]: Cannot configure PUBLIC schema of new database #2826

wesleyhillyext opened this issue May 20, 2024 · 14 comments
Labels
feature-request Used to mark issues with provider's missing functionalities resource:database Issue connected to the snowflake_database resource resource:schema Issue connected to the snowflake_schema resource

Comments

@wesleyhillyext
Copy link

Terraform CLI Version

1.7.4

Terraform Provider Version

0.88.0

Terraform Configuration

resource "snowflake_database" "my_database" {
  name = "my_database"
}

resource "snowflake_schema" "my_database_public" {
  database = snowflake_database.my_database.name
  name     = "PUBLIC"

  is_managed          = true
  data_retention_days = 30
}

Category

category:resource

Object type(s)

No response

Expected Behavior

When applying these resources for the first time, the database is created by the snowflake_database resource. Then the PUBLIC schema for that database is created by the snowflake_schema resource.

Actual Behavior

The database is created by the snowflake_database resource. The snowflake_schema resource then fails to apply when it runs CREATE SCHEMA PUBLIC, which fails because the schema already exists, since in Snowflake a CREATE DATABASE automatically creates a PUBLIC schema with default properties.

Steps to Reproduce

  1. terraform apply the example configuration when the resources don't already exist in the TF state or in Snowflake.

How much impact is this issue causing?

Medium

Logs

No response

Additional Information

This reopens #306 with some additional thinking.

This has been a thorn in our side when creating new databases, as we either want to have an explicit snowflake_schema block for the PUBLIC schema, or in some cases don't want a PUBLIC schema for a specific database.

Proposed solution: After considerable thinking on this issue, I've come to believe there's only one workable solution: add a boolean option to the snowflake_database resource which, if enabled, causes it to execute DROP SCHEMA PUBLIC if it executes a CREATE DATABASE.

Alternatives considered and dismissed:

  1. Apply once, manually drop the PUBLIC schema when the snowflake_schema resource fails, then apply again.
  • This is our current workaround, but the manual step prevents deployment of Snowflake terraform through a CI pipeline.
  1. Allow configuration of the PUBLIC schema in the snowflake_database resource.
  • This doesn't address the case when you don't want a PUBLIC schema.
  • Would mean that different schemas would be configured different ways, which is undesireable.
  1. Use import blocks to automatically import the PUBLIC schema.
  • This doesn't address the case when you don't want a PUBLIC schema.
  • The import would have to be executed after the snowflake_database resource is created, and it's not clear that Terraform allows specifying this kind of dependency.
  • The automatically-created schema is never a managed-access schema, but we always use managed-access schemas. So even if imported, the terraform_schema resource will attempt to run ALTER SCHEMA PUBLIC ENABLE MANAGED ACCESS, which fails unless you have the MANAGE GRANTS privilege. A solution should require nothing but the CREATE DATABASE privilege, as MANAGE GRANTS is the most powerful privilege (you can use it to grant yourself ACCOUNTADMIN), so we don't have that privilege when applying most terraform modules.
  1. Improve the CREATE DATABASE statement to allow a WITHOUT PUBLIC SCHEMA option.
  • This would require additions to the Snowflake SQL language itself, so it's not purely achievable by only changing the terraform provider.
  • Mentioning this option since I get the impression Snowflake is looking at more declarative-friendly language features, like CREATE OR ALTER. And a WITHOUT PUBLIC SCHEMA option would be a little cleaner than having to run DROP SCHEMA PUBLIC.

It could be argued that the proposed "drop-public" behavior of the snowflake_database resource is the logical default behavior. Or the only possible behavior – i.e. always require a separate snowflake_schema resource for PUBLIC. But this way it wouldn't be backwards-compatible, which is why I suggested making this behavior opt-in. That said, the provider is currently making many other breaking changes with the goal of cleaning it up before a v1 release, so perhaps you might like to make this behavior the default/only behavior after all.

Would you like to implement a fix?

  • Yeah, I'll take it 😎
@wesleyhillyext wesleyhillyext added the bug Used to mark issues with provider's incorrect behavior label May 20, 2024
@sfc-gh-asawicki sfc-gh-asawicki added feature-request Used to mark issues with provider's missing functionalities and removed bug Used to mark issues with provider's incorrect behavior labels May 20, 2024
@sfc-gh-asawicki
Copy link
Collaborator

Hey @wesleyhillyext. Thanks for reaching out to us.

We have planned to discuss this issue with the schema resource rework as part of https://github.com/Snowflake-Labs/terraform-provider-snowflake/blob/main/ROADMAP.md#preparing-essential-ga-objects-for-the-provider-v1.

A few comments from my side:

  • For the first dismissed alternative you could utilize https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest/docs/resources/unsafe_execute to avoid manual changes (not ideal but should work).
  • In the third dismissed alternative you say that you always use managed-access schemas. So maybe you should not use the PUBLIC schema at all? The Snowflake creates this schema for a reason, if there is no option (like alternative number 4) on the SQL level, I don't think we should diverge from it in the provider. Afterall, you have options to adjust the schema (e.g. unsafe execute resource running arbitrary SQL statements)
  • The fourth option is a valid solution, please reach out to your account manager to discuss it.
  • Your solution is one of the options we will consider. The other one not mentioned is using CREATE OR REPLACE syntax in the snowflake_schema resource for PUBLIC schema only (we have not yet tested or researched if this is a good one, but it would certainly allow introducing modifications to the automatically created one). Also, we may consider using a separate resource just to handle PUBLIC schema (i.e. snowflake_public_schema).

I just want to point out, that there is also the possibility that we will decide not to handle the PUBLIC schema differently inside the provider at all (leaving workarounds like unsafe execute valid options). With the motivation written above: this behavior is just aligned with Snowflake.

@wesleyhillyext
Copy link
Author

Thanks for your reply @sfc-gh-asawicki, good to know this was on your radar. I've been thinking about some of your comments and just had a question on one:

So maybe you should not use the PUBLIC schema at all? The Snowflake creates this schema for a reason,

From my understanding, using the PUBLIC schema is helpful because it's the default schema when someone runs USE DATABASE, and is automatically on the search path without having to educate our users about search paths.

Whereas managed-access schemas seem more about centralized vs decentralized access management (centralized is the approach we use).

So I didn't understand why our use of managed-access schemas might clash with the point of the PUBLIC schema – is there some other purpose implied for the PUBLIC schema?

Your solution is one of the options we will consider.

I think any of my proposed solution, a CREATE OR REPLACE approach, or a separate snowflake_public_schema could be made to work for us.

The documentation says that snowflake_unsafe_execute is deprecated and will be removed; I think I'd also be pretty reluctant to use it. So I hope you decide to provide some other solution.

@sfc-gh-asawicki
Copy link
Collaborator

So I didn't understand why our use of managed-access schemas might clash with the point of the PUBLIC schema – is there some other purpose implied for the PUBLIC schema?

My only point here was, that currently managing PUBLIC schema through our provider is harder than managing the manually-created schemas (as an alternative solution but I don't know your use case, so this was just a loose idea).

The documentation says that snowflake_unsafe_execute is deprecated and will be removed; I think I'd also be pretty reluctant to use it. So I hope you decide to provide some other solution.

Yeah, even the name scares potential users away, because this was the initial plan. However, we have decided to stick with this resource even after V1, maybe with a bit "safer" implementation. Docs were not adjusted because the resource is still dangerous if used incorrectly but you can see this resource on the list linked here: https://github.com/Snowflake-Labs/terraform-provider-snowflake/blob/main/ROADMAP.md#preparing-essential-ga-objects-for-the-provider-v1.

@sfc-gh-jmichalak
Copy link
Collaborator

Hi @wesleyhillyext 👋 PUBLIC schemas can be managed in v0.94. Now, when the name is PUBLIC, it's created with OR REPLACE and can be managed as a regular schema. Creating databases without PUBLIC schemas is not supported - with the motivation written above: this behavior is just aligned with Snowflake.

@wesleyhillyext
Copy link
Author

Thanks for the update @sfc-gh-jmichalak @sfc-gh-asawicki.

That will resolve the main operational issue with declaring the PUBLIC schema, though I wonder if you considered the non-obvious risk this approach introduces.

Normally, creating resources in terraform is always a non-destructive operation. But CREATE OR REPLACE SCHEMA can be a destructive operation, and it can destroy your entire existing schema! All it would take is for someone to move the snowflake_schema resource for their PUBLIC schema in their terraform code, and forget to do the corresponding terraform state mv. Or accidentally perform the wrong terraform state rm. The terraform plan will then look like it creates, not replaces, the PUBLIC schema, and I think even a lifecycle { prevent_destroy = true } on the schema resource won't save the day. As far as risks go, it would be low-probability, but high-impact.

Safety is one of the reasons I ended up proposing the solution I originally proposed.

Creating databases without PUBLIC schemas is not supported - with the motivation written above: this behavior is just aligned with Snowflake.

I didn't really dive into this philosophy earlier on, but I may as well add my two cents now: SQL is not Terraform. SQL DDL is an imperative language, whereas terraform is a declarative language; it seems entirely natural to me that there would be some "impedance mismatch" between the declarative and imperative paradigms, such that what makes sense as one declarative resource doesn't always align 1-1 with a single imperative SQL statement. Such alignment might be a fine consideration when all else is equal, but I would caution against this becoming an overriding consideration.

@sfc-gh-asawicki
Copy link
Collaborator

Hey @wesleyhillyext. Thanks for your thoughts.

TBH for me it's not a big difference if we add a boolean option to the snowflake_database resource which, if enabled, causes it to execute DROP SCHEMA PUBLIC or if we run OR REPLACE:

  • they both happen during the create
  • they are both a bit hidden
  • they are both actions of the user.

Fortunately:

However, your concerns are valid. Ultimately, create or alter schema (like the one that is already in preview for tables: https://docs.snowflake.com/en/sql-reference/sql/create-table#create-or-alter-table) would be a better solution.

sfc-gh-jmichalak added a commit that referenced this issue Aug 2, 2024
<!-- Feel free to delete comments as you fill this in -->
- during create operation, detect if PUBLIC schema has already been
created, and instead of using OR REPLACE, redirect the flow to
`UpdateContextSchema` (note that `name` change needs to be excluded to
avoid unnecessary rename)
- add a check to acceptance tests to ensure the schema hasn't been
dropped
- describe new behavior in the migration guide for v0.94.1 and add a
note for v0.94.0
<!-- summary of changes -->

## Test Plan
<!-- detail ways in which this PR has been tested or needs to be tested
-->
* [x] acceptance tests

## References
<!-- issues documentation links, etc  -->
References
#2826,
specifically comment
#2826 (comment)
@sfc-gh-jmichalak
Copy link
Collaborator

Hi @wesleyhillyext 👋 This has been fixed in v0.94.1 release. Now, we use ALTER for managing PUBLIC schemas that exist instead of OR REPLACE.

@wesleyhillyext
Copy link
Author

Thanks for the update @sfc-gh-jmichalak. Unfortunately, that approach breaks in our case.

Using ALTER is mostly equivalent to alternative #3 from my original post. And it doesn't work for us for the same reason. You can't ALTER SCHEMA PUBLIC ENABLE MANAGED ACCESS on a schema without using the SECURITYADMIN or ACCOUNTADMIN roles, and we don't use those roles to apply most of our terraform. We use a role hierarchy to enforce environment isolation as a best practice, so we're applying with our own environment-admin roles like DEV_ADMIN, PROD_ADMIN etc.

I originally proposed the "drop public schema immediately after a create" solution after pretty careful consideration of all these kinds of alternatives, and I still think it's the only approach which doesn't introduce any of its own problems.

@sfc-gh-asawicki
Copy link
Collaborator

Hey @wesleyhillyext.

We want to avoid dropping the PUBLIC schema programmatically (mainly because of the reasons discussed above), so we explored the non-dropping solutions. Ultimately, we will replace it with CREATE OR ALTER, so all the limitations will be the same as for CREATE OR ALTER.

The role requirement is a Snowflake limitation, not the provider's. We try to align every resource with the Snowflake role requirements. You can report it to Snowflake support (but I guess you have already done so).

I think it is still possible to handle your case in the provider:

  1. database resource creating db (and public schema)
  2. unsafe execute resource, depending on the above db resource, to drop the PUBLIC schema
  3. schema resource depending on the unsafe execute, creating the public schema again

Will it work for you?

@wesleyhillyext
Copy link
Author

We want to avoid dropping the PUBLIC schema programmatically (mainly because of the reasons discussed above)

If the database resource just successfully executed a CREATE DATABASE, then it knows the PUBLIC schema must be empty and is safe to drop. This is actually the only situation when the provider can know for sure that it is safe to drop/replace the schema, and is why I proposed handling this in the database resource, not the schema resource.

unsafe execute resource, depending on the above db resource, to drop the PUBLIC schema

This is effectively equivalent to the OR REPLACE solution you had in v0.94. This would work, but we'd have to be very careful to ensure the terraform state will always be present to prevent the schema being dropped when we didn't intend it to be!

The role requirement is a Snowflake limitation, not the provider's.

Sorry, I'm not quite sure what you were trying to get at here. It is already possible to achieve the desired resource state in Snowflake without using the ACCOUNTADMIN role:

USE ROLE PROD_ADMIN;
CREATE DATABASE PROD_DB;
DROP SCHEMA PUBLIC;
CREATE SCHEMA PUBLIC WITH MANAGED ACCESS;

Personally, I think the goal of being able to apply terraform with roles other than ACCOUNTADMIN is much more important to keep in sight that the principle of one-to-one correspondence such that one terraform resource == one SQL statement.

@sfc-gh-asawicki
Copy link
Collaborator

TL;DR

We will add drop_public_schema_on_creation flag to the database resource in one of the upcoming versions.

My answers below (not to leave the questions unanswered).

This is effectively equivalent to the OR REPLACE solution you had in v0.94. This would work, but we'd have to be very careful to ensure the terraform state will always be present to prevent the schema being dropped when we didn't intend it to be!

The result is the same, but it differs in an important detail: it's not hidden from the provider's user like the OR REPLACE was.

Sorry, I'm not quite sure what you were trying to get at here.

This:

So even if imported, the terraform_schema resource will attempt to run ALTER SCHEMA PUBLIC ENABLE MANAGED ACCESS, which fails unless you have the MANAGE GRANTS privilege. A solution should require nothing but the CREATE DATABASE privilege, as MANAGE GRANTS is the most powerful privilege (you can use it to grant yourself ACCOUNTADMIN), so we don't have that privilege when applying most terraform modules.

Personally, I think the goal of being able to apply terraform with roles other than ACCOUNTADMIN is much more important to keep in sight that the principle of one-to-one correspondence such that one terraform resource == one SQL statement.

We do not have a principle of one terraform resource == one SQL statement. We do have a principle of following Snowflake requirements for the role used to manage the given resource. Our experience with the provider users shows that it's best to follow strictly how the objects are created on the Snowflake side without too much additional magic. And that's exactly what we are trying to do here: keep the provider simple but leave the way for more complex setups for advanced users.

USE ROLE PROD_ADMIN;
CREATE DATABASE PROD_DB;
DROP SCHEMA PUBLIC;
CREATE SCHEMA PUBLIC WITH MANAGED ACCESS;

And this is exactly what I proposed:

  1. database resource creating db (and public schema)
  2. unsafe execute resource, depending on the above db resource, to drop the PUBLIC schema
  3. schema resource depending on the unsafe execute, creating the public schema again

sfc-gh-jcieslak added a commit that referenced this issue Aug 8, 2024
## Changes
- #2978 (adjust migration guide)
- Fix database tests (remove tests for state upgraders, because they
were incorrect and document step-by-step process for manual testing; I
went through all of it)
- Introduce a new parameter to address #2826 and test it
- added a drop with retries just to be sure, but I don't think it should
ever fail other than some kind of strange networking error between
database create and drop schema
- I opted to drop the schema after setting id, so if we manage to create
a database, but not drop the schema, the provider will print out a
warning message that the public schema couldn't be dropped and it should
be done manually. The other approaches I considered:
- Do it before setting ID, but that would mean the Terraform running
create again and failing on CREATE DATABASE (because it's already
there).
- Do it like right now, but make sure the public schema is gone in the
alter, but I didn't go for that, because it could lead to further
complications with (what is considered "after create"; after successful
create? how would we keep the state of create status? etc.).

---------

Co-authored-by: Jakub Michalak <jakub.michalak@snowflake.com>
@wesleyhillyext
Copy link
Author

Thanks; that drop_public_schema_on_creation flag will do the job nicely. I will certainly be keeping an eye on the release notes to see when we can adopt this.

@sfc-gh-asawicki
Copy link
Collaborator

Hey @wesleyhillyext. We released v0.95 yesterday containing drop_public_schema_on_creation attribute. Please check and let us know if this works for you.

@sfc-gh-asawicki sfc-gh-asawicki added resource:database Issue connected to the snowflake_database resource resource:schema Issue connected to the snowflake_schema resource labels Sep 5, 2024
@wesleyhillyext
Copy link
Author

Thanks very much for implementing that! The docs and commit look good to me.

We're currently on version 0.88, and we're going to have to schedule a project to get through the many changes made to the provider to bring us up to 0.95. In the meantime, I think you can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Used to mark issues with provider's missing functionalities resource:database Issue connected to the snowflake_database resource resource:schema Issue connected to the snowflake_schema resource
Projects
None yet
Development

No branches or pull requests

3 participants