Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File System: Add optional_ptr<FileOpener> to various calls, and add support for attaching DuckDB files over S3 #11343

Merged
merged 20 commits into from
Mar 26, 2024

Conversation

Mytherin
Copy link
Collaborator

Follow-up from #11259 and #11297

This PR expands several FileSystem API calls to add an optional_ptr<FileOpener> to various methods that were missing them:

  • FileExists
  • MoveFile
  • RemoveFile
  • DirectoryExists
  • CreateDirectory
  • RemoveDirectory
  • IsPipe

The FileOpener is used in order to pass along information from the ClientContext or source Database into the file system. In particular, it is used by the S3 and Azure file systems to pass along authentication information (e.g. secrets). Without a FileOpener, these function calls do not have authentication information, which makes it impossible to support these functions for e.g. private S3 buckets. This PR resolves that issue.

S3 Attach

After the secret manager was added - this actually solves the last roadblock for getting ATTACH to work for DuckDB databases over S3. This PR also addresses a few more minor problems that now allow for reading DuckDB databases over S3.

  • FILE_FLAGS_PARALLEL_ACCESS: This is a new flag added in this PR that signifies we need the file system to support parallel read/write access to the same file handle, as that is what happens when attaching a DuckDB database. For the httpfs extension, this flag disables the cache within the file, as that is not thread-safe, and instead directly submits GET requests to S3.
  • DatabaseFileSystem: we add an OpenerFileSystem based on the DatabaseInstance class, which is returned by FileSystem::GetFileSystem(db). This allows us to access a DatabaseInstance from a FileOpener, even when a ClientContext is not available. This can happen for example when loading the initial database.
  • TryGetDatabase: As a follow-up to this, we also add a new method to the FileOpener - TryGetDatabase. This method can be used to obtain a DatabaseInstance, and works for both the DatabaseFileOpener and ClientContextFileOpener. This method is used to obtain the secret manager in the HTTPFS file system, allowing (persistent) S3 secrets to be accessed when attaching to a DuckDB database.

@Mytherin Mytherin requested a review from samansmink March 25, 2024 16:32
@Mytherin
Copy link
Collaborator Author

CC @quentingodeau

@github-actions github-actions bot marked this pull request as draft March 25, 2024 18:23
@Mytherin Mytherin marked this pull request as ready for review March 25, 2024 18:25
@github-actions github-actions bot marked this pull request as draft March 26, 2024 08:39
@Mytherin Mytherin marked this pull request as ready for review March 26, 2024 08:49
Copy link
Contributor

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Good idea on the FILE_FLAGS_PARALLEL_ACCESS

@Mytherin Mytherin merged commit 0465529 into duckdb:main Mar 26, 2024
47 checks passed
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants