Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sk] Improved Configuration Framework #664

Merged
merged 12 commits into from
Jul 15, 2022
Merged

Conversation

skunichetty
Copy link
Contributor

@skunichetty skunichetty commented Jul 15, 2022

Summary

Introduces a more structured configuration framework, enabling users to have a more robust method to handle authentication features for data IO. Two new features were added:

  • A new configuration framework based around BaseConfigLoader, a base class that requires all loader to have the following two functions
    • get - get the configuration setting given the respective key. If setting doesn't exist, returns None.
    • contains - check if the configuration setting corresponding to the given key exists
  • A new unified set of keys that define all the secrets used by data loaders (stored under the ConfigKey enum). The following template shows how a configuration file with these new keys look:
default:
  # Default profile created for data IO access.
  # Add your credentials for the source you use, and delete the rest.
  AWS_ACCESS_KEY_ID = AWS Access Key ID credential
  AWS_SECRET_ACCESS_KEY = AWS Secret Access Key credential
  AWS_SESSION_TOKEN = AWS Session Token (used to generate temp DB credentials)
  AWS_REGION = AWS Region
  GOOGLE_SERVICE_ACC_KEY = Service account key (add nested object here)
  GOOGLE_SERVICE_ACC_KEY_FILEPATH = Path to service account key
  POSTGRES_DBNAME = Database name
  POSTGRES_USER = Database login username
  POSTGRES_PASSWORD = Database login password
  POSTGRES_HOST = Database hostname
  POSTGRES_PORT = PostgreSQL database port
  REDSHIFT_DBNAME = Name of Redshift database to connect to
  REDSHIFT_HOST = Redshift Cluster hostname
  REDSHIFT_PORT = Redshift Cluster port. Optional, defaults to 5439
  REDSHIFT_TEMP_CRED_USER = Redshift temp credentials username
  REDSHIFT_TEMP_CRED_PASSWORD = Redshift temp credentials password
  REDSHIFT_DBUSER = Redshift database user to generate credentials for
  REDSHIFT_CLUSTER_ID = Redshift cluster ID
  REDSHIFT_IAM_PROFILE = Name of the IAM profile to generate temp credentials with
  SNOWFLAKE_USER = Snowflake username
  SNOWFLAKE_PASSWORD = Snowflake password
  SNOWFLAKE_ACCOUNT = Snowflake account ID (including region)
  SNOWFLAKE_DEFAULT_WH = Default warehouse to use. Optional
  SNOWFLAKE_DEFAULT_DB = Default database to use. Optional
  SNOWFLAKE_DEFAULT_SCHEMA = Default schema to use. Optional

Currently, support for loading from configuration file, environment variables, and AWS Secrets Manager is supported.

Notes

  1. I'll push integration with the old configuration system and data IO clients in a separate PR (this one is already long enough)
  2. This PR is mainly unit tests and documentation - the amount of new code added is quite small.

Tests

Unit tests were written to test the behavior of the configuration loaders.

cc:
@wangxiaoyou1993 @dy46

@skunichetty skunichetty marked this pull request as ready for review July 15, 2022 18:54
@skunichetty skunichetty changed the title [sk] [WIP] Improved Configuration Framework [sk] Improved Configuration Framework Jul 15, 2022
@wangxiaoyou1993
Copy link
Member

wangxiaoyou1993 commented Jul 15, 2022

Can user use combination of the environment variables and plain text in the config yaml?

@skunichetty
Copy link
Contributor Author

By plaintext do you mean the old configuration file syntax? I can add back support for it if needed.

@wangxiaoyou1993
Copy link
Member

wangxiaoyou1993 commented Jul 15, 2022

I wonder whether we can support something like this: https://docs.getdbt.com/reference/dbt-jinja-functions/env_var

@skunichetty
Copy link
Contributor Author

Okay, I can add that.

@skunichetty
Copy link
Contributor Author

skunichetty commented Jul 15, 2022

Added support for using DBT-style jinja templating notation with environment variables in configuration file.


class ConfigFileLoader(BaseConfigLoader):
def __init__(
self, filepath: os.PathLike = './default_repo/io_config.yaml', profile='default'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use default_repo here. default_repo is only for development. use get_repo_path() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@skunichetty skunichetty force-pushed the sk--config_framework branch from dbf6574 to 7f0965d Compare July 15, 2022 23:29
@skunichetty skunichetty merged commit df2b14c into master Jul 15, 2022
@skunichetty skunichetty deleted the sk--config_framework branch July 15, 2022 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants