Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to dbt-common + dbt-adapters #342

Merged

Conversation

jtcohen6
Copy link
Contributor

resolves #341

Starting a draft PR for discussion!

This mostly just works by updating import statements. Caveats:

  • In dbt.adapters.duckdb.utils: SourceConfig and TargetConfig use type definitions that are not available from the common/adapter interfaces. We could either stub those types within duckdb.utils, or discuss if they need to be pulled into the shared interfaces.
  • Some of the plugin tests are failing for me locally. I'd like to see if these are real issues, or just problems with my local setup.

Copy link
Collaborator

@jwills jwills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a week it has been

@@ -75,7 +76,7 @@ def as_dict(self) -> Dict[str, Any]:
class TargetConfig:
relation: BaseRelation
column_list: Sequence[Column]
config: RuntimeConfigObject
config: Any # TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the safest since our usage of this config object is pretty much limited to get methods

@@ -47,7 +48,7 @@ def as_dict(self) -> Dict[str, Any]:
return base

@classmethod
def create_from_source(cls, source: SourceDefinition) -> "SourceConfig":
def create_from_source(cls, source: Any) -> "SourceConfig":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing the SourceDefinition typing makes me a bit more nervous since we rely on its fields pretty extensively and the external_location features are a pretty popular part of dbt-duckdb on top of simple data lakes or just CSV/Parquet files on a NAS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heard - let me bug @MichelleArk & @colin-rogers-dbt about this and get back to you

Copy link
Contributor

@MichelleArk MichelleArk Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the decoupling work, we removed the BaseRelation.create_from_source and BaseRelation.create_from_node methods in favor of a consolidated BaseRelation.create_from method that accepts quoting: HasQuoting and relation_config: RelationConfig (dbt-labs/dbt-core#9210). I've updated this branch to reflect that typing + implemented DuckDBRelation.create_from here: 5712711.

We should also extend the RelationConfig to also provide meta and tags, as well as resource_type so this kind of conditional logic can still be implemented reliably for sources in dbt-adapters: https://github.com/dbt-labs/dbt-adapters/pull/120/files.

Still need to figure out how we want to handle source_meta, since its not a common property across sources + other node types. I see at a few options:

  1. Do nothing in dbt-adapters/core, and update the duckdb create_from_source method to use hasattr or something similarly unsatisfying
  2. Implement a merged_meta attribute on sources + nodes in dbt-core and extend RelationConfig to expect merged_meta, which the duckdb adapter could use instead of accessing source_meta + merging. This doesn't feel like the slickest interface design and would need changes to most of our node classes in core but it'd get the job done.
  3. Potentially -- In dbt-core, update the meta field on SourceDefinition to include fields from source_meta by default, meaning the duckdb adapter wouldn't need to do the update business at all. It might be tricky to do this in a backward-compatible way in dbt-core though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichelleArk thank you for this, I really appreciate it. Will try to noodle on a good strategy here, but using the hasattr stuff as an ugly-but-necessary fallback is okay with me if we don't find something better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead with option 3 above in dbt-core: dbt-labs/dbt-core#9766, since it was actually pretty straightforward to do in a backward-compatible way (meta merges table.meta and source.meta, with table.meta taking precedence), and updated the branch to remove the source_meta reference in favor of just meta which now includes both table + source meta attributes.

I also updated the reference to config._extra to config.extra, as those should be identical and extra is already part of the RelationConfig protocol: https://github.com/dbt-labs/dbt-adapters/blob/35bd3629c390cf87a0e52d999679cc5e33f36c8f/dbt/adapters/contracts/relation.py#L41.

Tests were passing for me locally against dbt-core@main. @jwills -- could we kick off CI again?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @MichelleArk , thank you!

@jwills
Copy link
Collaborator

jwills commented Mar 18, 2024

@MichelleArk we don't need to worry about the MD one (it requires my secret token); the code quality stuff can be fixed by running pre-commit run --all-files, I'll look at the Buena Vista failure

@jtcohen6 jtcohen6 marked this pull request as ready for review April 16, 2024 10:03
Copy link
Contributor Author

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwills I think this is just about ready!

Could you help me look into the Buena Vista failures? I see what appear to be connection errors:

  • no results to fetch
  • server closed the connection unexpectedly

Comment on lines +43 to +44
# add dbt-core to ensure backwards compatibility of installation, this is not a functional dependency
"dbt-core>=1.8.0b1",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though dbt-core is no longer a functional dependency of adapters, we're going to keep including dbt-core in setup.py to prevent surprising/breaking changes to the behavior of pip install dbt-duckdb on the day of the v1.8 final release.

dbt-duck shouldn't import anything from dbt-core directly, only from the interfaces defined in dbt-core + dbt-adapters.

@jwills
Copy link
Collaborator

jwills commented Apr 16, 2024

@jtcohen6 thank you so much! Will take a look at the BV stuff which I think is broken on main as well; when does all of this need to be ready to go out the door?

@jtcohen6
Copy link
Contributor Author

@jwills Anytime in the next few weeks is good! Folks are beta-testing v1.8 now, we're planning to put out the release candidate (dbt-core==1.8.0rc1) on May 2.

@jwills
Copy link
Collaborator

jwills commented Apr 16, 2024

just punting on this for now #379

from dbt.tests.adapter.unit_testing.test_invalid_input import BaseUnitTestInvalidInput


class TestUnitTestingTypesDuckDB(BaseUnitTestingTypes):
Copy link
Contributor Author

@jtcohen6 jtcohen6 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwills I pulled these in (selfishly), since it seemed like unit testing might "just work" on dbt-duckdb with the v1.8 interface updates.

It's all well & good on the Real Duck, but it looks like the test is failing on BuenaVista for the last data type in the list (ARRAY[1,2,3]):

  File "/home/runner/work/dbt-duckdb/dbt-duckdb/.tox/buenavista/lib/python3.9/site-packages/buenavista/postgres.py", line 104, in <lambda>
    BVType.INTEGERARRAY: (1007, lambda v: "{" + ",".join(v) + "}"),
TypeError: sequence item 0: expected str instance, int found
sequence item 0: expected str instance, int found
---------------------- CSV report: buenavista_results.csv ----------------------
=========================== short test summary info ============================
FAILED tests/functional/adapter/test_unit_testing.py::TestUnitTestingTypesDuckDB::test_unit_test_data_type - AssertionError: unit test failed when testing model with ARRAY[1,2,3]

How would you feel about me adding @pytest.mark.skip_profile("buenavista") to this test? I know that adds a bit of tech debt / mystery for you later on

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s just because I don’t properly support array types in the BV Postgres serialization, so it’s perfectly understandable and okay to mark it as skipped

@jwills
Copy link
Collaborator

jwills commented May 6, 2024

@jtcohen6 this is all good, right? We're going to cut one last 1.7.x release and then merge this in /cc @guenp

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented May 6, 2024

@jwills ready to ship when you see fit!

@jwills jwills merged commit a42274a into duckdb:master May 8, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to dbt-common + dbt-adapters for v1.8+
3 participants