Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework FileSystem::OpenFile call, and add FILE_FLAGS_NULL_IF_NOT_EXISTS #11297

Merged
merged 33 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
6366b86
WIP TryOpenFile again
Mytherin Mar 20, 2024
3d6d096
Compiling again
Mytherin Mar 21, 2024
c7b935a
Merge branch 'main' into tryopenfile2
Mytherin Mar 21, 2024
0ea9d78
Revert WAL changes, skip TPCDS join order test for now
Mytherin Mar 21, 2024
9a18031
Avoid FileExists(path) { RemoveFile(...) } anti-pattern and replace w…
Mytherin Mar 21, 2024
56162d5
Remove TryOpenFile, instead make this a parameter to OpenFile
Mytherin Mar 21, 2024
eee76b8
Rework OpenFile - merge lock and compression into FileOpenFlags
Mytherin Mar 21, 2024
32997ab
Fix httpfs compilation
Mytherin Mar 21, 2024
be0d0b0
Improve documentation of S3 tests
Mytherin Mar 21, 2024
e92709c
Warn about concurrently running clickhouse, and ignore errors in dock…
Mytherin Mar 21, 2024
0c4c848
DynamicCastCheck
Mytherin Mar 21, 2024
4b4081d
Revert changes to RemoveFile/RemoveDirectory
Mytherin Mar 21, 2024
2aac450
Fix #9342: WAL - instead of using FileExists -> OpenFile, use OpenFil…
Mytherin Mar 21, 2024
e9d8a86
Format fix
Mytherin Mar 21, 2024
0d29350
Fix for pyfilesystem
Mytherin Mar 21, 2024
cd5a647
FILE_FLAGS_NULL_IF_NOT_EXISTS cannot be combined with CREATE
Mytherin Mar 21, 2024
816c7eb
Add azure/spatial patches
Mytherin Mar 21, 2024
1dba9b5
APPLY_PATCHES for mingw
Mytherin Mar 21, 2024
bd6a157
Actually fix Azure and Spatial
Mytherin Mar 21, 2024
2982b5f
Format fix
Mytherin Mar 21, 2024
ba9fd81
Rename to more logical name
Mytherin Mar 21, 2024
7b3917d
Add more documentation here
Mytherin Mar 21, 2024
f9b89b6
Fix for Windows
Mytherin Mar 21, 2024
a55feb3
Run generate_enum_util again
Mytherin Mar 21, 2024
55c35fd
Implement FILE_FLAGS_NULL_IF_NOT_EXISTS in pyfilesystem
Mytherin Mar 21, 2024
2b8e25b
Format
Mytherin Mar 21, 2024
122634b
path_p
Mytherin Mar 21, 2024
f09ad43
Avoid checking for the existence of :memory:, and avoid unnecessarily…
Mytherin Mar 21, 2024
24f343d
Keep using fs.FileExists for extensions
Mytherin Mar 21, 2024
5ae0f61
Fix iceberg
Mytherin Mar 21, 2024
5617ee0
Merge branch 'main' into tryopenfile2
Mytherin Mar 21, 2024
b0f55ee
Re-enable this test
Mytherin Mar 21, 2024
8837144
Windows spatial fix
Mytherin Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add azure/spatial patches
  • Loading branch information
Mytherin committed Mar 21, 2024
commit 816c7eb9be31c8dd1e4606f43f9c508fd640b64d
1 change: 1 addition & 0 deletions .github/config/out_of_tree_extensions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ if (NOT MINGW)
LOAD_TESTS
GIT_URL https://github.com/duckdb/duckdb_azure
GIT_TAG 86f39d76157de970d16d6d6537bc90c0ee1c7d35
APPLY_PATCHES
)
endif()

Expand Down
210 changes: 210 additions & 0 deletions .github/patches/extensions/azure/open_file.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
diff --git a/src/azure_blob_filesystem.cpp b/src/azure_blob_filesystem.cpp
index 6f4d0dc..935f30a 100644
--- a/src/azure_blob_filesystem.cpp
+++ b/src/azure_blob_filesystem.cpp
@@ -66,21 +66,20 @@ AzureBlobContextState::GetBlobContainerClient(const std::string &blobContainerNa
}

//////// AzureBlobStorageFileHandle ////////
-AzureBlobStorageFileHandle::AzureBlobStorageFileHandle(AzureBlobStorageFileSystem &fs, string path, uint8_t flags,
+AzureBlobStorageFileHandle::AzureBlobStorageFileHandle(AzureBlobStorageFileSystem &fs, string path, FileOpenFlags flags,
const AzureReadOptions &read_options,
Azure::Storage::Blobs::BlobClient blob_client)
: AzureFileHandle(fs, std::move(path), flags, read_options), blob_client(std::move(blob_client)) {
}

//////// AzureBlobStorageFileSystem ////////
-unique_ptr<AzureFileHandle> AzureBlobStorageFileSystem::CreateHandle(const string &path, uint8_t flags,
- FileLockType lock, FileCompressionType compression,
- FileOpener *opener) {
- if (opener == nullptr) {
+unique_ptr<AzureFileHandle> AzureBlobStorageFileSystem::CreateHandle(const string &path, FileOpenFlags flags,
+ optional_ptr<FileOpener> opener) {
+ if (!opener) {
throw InternalException("Cannot do Azure storage CreateHandle without FileOpener");
}

- D_ASSERT(compression == FileCompressionType::UNCOMPRESSED);
+ D_ASSERT(flags.Compression() == FileCompressionType::UNCOMPRESSED);

auto parsed_url = ParseUrl(path);
auto storage_context = GetOrCreateStorageContext(opener, path, parsed_url);
diff --git a/src/azure_dfs_filesystem.cpp b/src/azure_dfs_filesystem.cpp
index 34435ae..8a756d6 100644
--- a/src/azure_dfs_filesystem.cpp
+++ b/src/azure_dfs_filesystem.cpp
@@ -83,21 +83,20 @@ AzureDfsContextState::GetDfsFileSystemClient(const std::string &file_system_name
}

//////// AzureDfsContextState ////////
-AzureDfsStorageFileHandle::AzureDfsStorageFileHandle(AzureDfsStorageFileSystem &fs, string path, uint8_t flags,
+AzureDfsStorageFileHandle::AzureDfsStorageFileHandle(AzureDfsStorageFileSystem &fs, string path, FileOpenFlags flags,
const AzureReadOptions &read_options,
Azure::Storage::Files::DataLake::DataLakeFileClient client)
: AzureFileHandle(fs, std::move(path), flags, read_options), file_client(std::move(client)) {
}

//////// AzureDfsStorageFileSystem ////////
-unique_ptr<AzureFileHandle> AzureDfsStorageFileSystem::CreateHandle(const string &path, uint8_t flags,
- FileLockType lock, FileCompressionType compression,
- FileOpener *opener) {
+unique_ptr<AzureFileHandle> AzureDfsStorageFileSystem::CreateHandle(const string &path, FileOpenFlags flags,
+ optional_ptr<FileOpener> opener) {
if (opener == nullptr) {
throw InternalException("Cannot do Azure storage CreateHandle without FileOpener");
}

- D_ASSERT(compression == FileCompressionType::UNCOMPRESSED);
+ D_ASSERT(flags.Compression() == FileCompressionType::UNCOMPRESSED);

auto parsed_url = ParseUrl(path);
auto storage_context = GetOrCreateStorageContext(opener, path, parsed_url);
diff --git a/src/azure_filesystem.cpp b/src/azure_filesystem.cpp
index 4c2caed..1c4c434 100644
--- a/src/azure_filesystem.cpp
+++ b/src/azure_filesystem.cpp
@@ -18,7 +18,7 @@ void AzureContextState::QueryEnd() {
is_valid = false;
}

-AzureFileHandle::AzureFileHandle(AzureStorageFileSystem &fs, string path, uint8_t flags,
+AzureFileHandle::AzureFileHandle(AzureStorageFileSystem &fs, string path, FileOpenFlags flags,
const AzureReadOptions &read_options)
: FileHandle(fs, std::move(path)), flags(flags),
// File info
@@ -27,7 +27,7 @@ AzureFileHandle::AzureFileHandle(AzureStorageFileSystem &fs, string path, uint8_
buffer_available(0), buffer_idx(0), file_offset(0), buffer_start(0), buffer_end(0),
// Options
read_options(read_options) {
- if (flags & FileFlags::FILE_FLAGS_READ) {
+ if (flags.OpenForReading()) {
read_buffer = duckdb::unique_ptr<data_t[]>(new data_t[read_options.buffer_size]);
}
}
@@ -37,7 +37,7 @@ void AzureFileHandle::PostConstruct() {
}

void AzureStorageFileSystem::LoadFileInfo(AzureFileHandle &handle) {
- if (handle.flags & FileFlags::FILE_FLAGS_READ) {
+ if (handle.flags.OpenForReading()) {
try {
LoadRemoteFileInfo(handle);
} catch (const Azure::Storage::StorageException &e) {
@@ -53,15 +53,15 @@ void AzureStorageFileSystem::LoadFileInfo(AzureFileHandle &handle) {
}
}

-unique_ptr<FileHandle> AzureStorageFileSystem::OpenFile(const string &path, uint8_t flags, FileLockType lock,
- FileCompressionType compression, FileOpener *opener) {
- D_ASSERT(compression == FileCompressionType::UNCOMPRESSED);
+unique_ptr<FileHandle> AzureStorageFileSystem::OpenFile(const string &path,FileOpenFlags flags,
+ optional_ptr<FileOpener> opener) {
+ D_ASSERT(flags.Compression() == FileCompressionType::UNCOMPRESSED);

- if (flags & FileFlags::FILE_FLAGS_WRITE) {
+ if (flags.OpenForWriting()) {
throw NotImplementedException("Writing to Azure containers is currently not supported");
}

- auto handle = CreateHandle(path, flags, lock, compression, opener);
+ auto handle = CreateHandle(path, flags, opener);
return std::move(handle);
}

@@ -92,7 +92,7 @@ void AzureStorageFileSystem::Read(FileHandle &handle, void *buffer, int64_t nr_b
idx_t buffer_offset = 0;

// Don't buffer when DirectIO is set.
- if (hfh.flags & FileFlags::FILE_FLAGS_DIRECT_IO && to_read > 0) {
+ if (hfh.flags.DirectIO() && to_read > 0) {
ReadRange(hfh, location, (char *)buffer, to_read);
hfh.buffer_available = 0;
hfh.buffer_idx = 0;
diff --git a/src/include/azure_blob_filesystem.hpp b/src/include/azure_blob_filesystem.hpp
index cb76d68..0f3d6d0 100644
--- a/src/include/azure_blob_filesystem.hpp
+++ b/src/include/azure_blob_filesystem.hpp
@@ -23,7 +23,7 @@ class AzureBlobStorageFileSystem;

class AzureBlobStorageFileHandle : public AzureFileHandle {
public:
- AzureBlobStorageFileHandle(AzureBlobStorageFileSystem &fs, string path, uint8_t flags,
+ AzureBlobStorageFileHandle(AzureBlobStorageFileSystem &fs, string path, FileOpenFlags flags,
const AzureReadOptions &read_options, Azure::Storage::Blobs::BlobClient blob_client);
~AzureBlobStorageFileHandle() override = default;

@@ -59,8 +59,8 @@ protected:
}
std::shared_ptr<AzureContextState> CreateStorageContext(FileOpener *opener, const string &path,
const AzureParsedUrl &parsed_url) override;
- duckdb::unique_ptr<AzureFileHandle> CreateHandle(const string &path, uint8_t flags, FileLockType lock,
- FileCompressionType compression, FileOpener *opener) override;
+ duckdb::unique_ptr<AzureFileHandle> CreateHandle(const string &path, FileOpenFlags flags,
+ optional_ptr<FileOpener> opener) override;

void ReadRange(AzureFileHandle &handle, idx_t file_offset, char *buffer_out, idx_t buffer_out_len) override;
};
diff --git a/src/include/azure_dfs_filesystem.hpp b/src/include/azure_dfs_filesystem.hpp
index c4bf8fe..f9794b8 100644
--- a/src/include/azure_dfs_filesystem.hpp
+++ b/src/include/azure_dfs_filesystem.hpp
@@ -25,7 +25,7 @@ class AzureDfsStorageFileSystem;

class AzureDfsStorageFileHandle : public AzureFileHandle {
public:
- AzureDfsStorageFileHandle(AzureDfsStorageFileSystem &fs, string path, uint8_t flags,
+ AzureDfsStorageFileHandle(AzureDfsStorageFileSystem &fs, string path, FileOpenFlags flags,
const AzureReadOptions &read_options,
Azure::Storage::Files::DataLake::DataLakeFileClient client);
~AzureDfsStorageFileHandle() override = default;
@@ -57,8 +57,8 @@ protected:
}
std::shared_ptr<AzureContextState> CreateStorageContext(FileOpener *opener, const string &path,
const AzureParsedUrl &parsed_url) override;
- duckdb::unique_ptr<AzureFileHandle> CreateHandle(const string &path, uint8_t flags, FileLockType lock,
- FileCompressionType compression, FileOpener *opener) override;
+ duckdb::unique_ptr<AzureFileHandle> CreateHandle(const string &path, FileOpenFlags flags,
+ optional_ptr<FileOpener> opener) override;

void ReadRange(AzureFileHandle &handle, idx_t file_offset, char *buffer_out, idx_t buffer_out_len) override;
};
diff --git a/src/include/azure_filesystem.hpp b/src/include/azure_filesystem.hpp
index a41d7a2..b68aa03 100644
--- a/src/include/azure_filesystem.hpp
+++ b/src/include/azure_filesystem.hpp
@@ -52,10 +52,10 @@ public:
}

protected:
- AzureFileHandle(AzureStorageFileSystem &fs, string path, uint8_t flags, const AzureReadOptions &read_options);
+ AzureFileHandle(AzureStorageFileSystem &fs, string path, FileOpenFlags flags, const AzureReadOptions &read_options);

public:
- const uint8_t flags;
+ const FileOpenFlags flags;

// File info
idx_t length;
@@ -76,9 +76,8 @@ public:
class AzureStorageFileSystem : public FileSystem {
public:
// FS methods
- duckdb::unique_ptr<FileHandle> OpenFile(const string &path, uint8_t flags, FileLockType lock = DEFAULT_LOCK,
- FileCompressionType compression = DEFAULT_COMPRESSION,
- FileOpener *opener = nullptr) override;
+ duckdb::unique_ptr<FileHandle> OpenFile(const string &path, FileOpenFlags flags,
+ optional_ptr<FileOpener> opener = nullptr) override;

void Read(FileHandle &handle, void *buffer, int64_t nr_bytes, idx_t location) override;
int64_t Read(FileHandle &handle, void *buffer, int64_t nr_bytes) override;
@@ -99,8 +98,8 @@ public:
void LoadFileInfo(AzureFileHandle &handle);

protected:
- virtual duckdb::unique_ptr<AzureFileHandle> CreateHandle(const string &path, uint8_t flags, FileLockType lock,
- FileCompressionType compression, FileOpener *opener) = 0;
+ virtual duckdb::unique_ptr<AzureFileHandle> CreateHandle(const string &path, FileOpenFlags flags,
+ optional_ptr<FileOpener> opener) = 0;
virtual void ReadRange(AzureFileHandle &handle, idx_t file_offset, char *buffer_out, idx_t buffer_out_len) = 0;

virtual const string &GetContextPrefix() const = 0;
32 changes: 32 additions & 0 deletions .github/patches/extensions/spatial/open_file.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
diff --git a/spatial/src/spatial/gdal/file_handler.cpp b/spatial/src/spatial/gdal/file_handler.cpp
index db449df..4985c9f 100644
--- a/spatial/src/spatial/gdal/file_handler.cpp
+++ b/spatial/src/spatial/gdal/file_handler.cpp
@@ -122,7 +122,7 @@ public:
auto &fs = FileSystem::GetFileSystem(context);

// TODO: Double check that this is correct
- uint8_t flags;
+ FileOpenFlags flags;
auto len = strlen(access);
if (access[0] == 'r') {
flags = FileFlags::FILE_FLAGS_READ;
@@ -167,7 +167,7 @@ public:
return new DuckDBFileHandle(std::move(file));
}
#endif
- auto file = fs.OpenFile(file_name, flags, FileSystem::DEFAULT_LOCK, FileCompressionType::AUTO_DETECT);
+ auto file = fs.OpenFile(file_name, flags | FileCompressionType::AUTO_DETECT);
return new DuckDBFileHandle(std::move(file));
} catch (std::exception &ex) {
// Failed to open file via DuckDB File System. If this doesnt have a VSI prefix we can return an error here.
@@ -209,8 +209,7 @@ public:

unique_ptr<FileHandle> file;
try {
- file = fs.OpenFile(file_name, FileFlags::FILE_FLAGS_READ, FileSystem::DEFAULT_LOCK,
- FileCompressionType::AUTO_DETECT);
+ file = fs.OpenFile(file_name, FileFlags::FILE_FLAGS_READ | FileCompressionType::AUTO_DETECT);
} catch (std::exception &ex) {
return -1;
}