Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setting to control the maximum swap space #10978

Merged
merged 40 commits into from
Apr 17, 2024

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Mar 4, 2024

This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/1371

As the title says, this PR adds a setting to control maximum swap space.

Using the temp_directory setting, a directory can be assigned to use by DuckDB to offload temporary intermediate data, to evict buffers that can not be processed yet but are taking up room required by other buffers, we store them on disk.

"Disk is full" issue

The problem this solves is that this can grow unbounded, in the worst case eventually flooding the entire disk, making it unusable.
To prevent this issue, we will instead abort the query if the max_temp_directory_size is reached

max_temp_directory_size

The setting defaults to 0 when temp_directory is not set.

If temp_directory is set to an existing directory, it will default to the available disk space on that drive.
If temp_directory is set to a non-existent directory, it will default to 0. When the directory gets created by us, we'll update it to the available disk space on that drive.

When the setting is set explicitly by the user, it will not be influenced by the temp_directory setting at all.
(see test/sql/storage/max_swap_space.test for examples of this)
When the value is "none", "null" or the value starts with - then the swap limit is set to unlimited.

Future work

Maybe in the future, instead of aborting the query, we can let the task block instead, to be rescheduled later when enough disk space is available
(realizing this might be a more general use for blocking tasks, rather than assigning a special event that is responsible for unblocking events, maybe we want to have a generic system that periodically checks a callback to detect if a task should be unblocked)

Other changes

FileSystem now has a static method GetAvailableDiskSpace which takes a path and retrieves the disk space info from it.

DatabaseInstance::Initialize no longer modifies the DBConfig passed in by the user, these modifications are now done directly to the internal config inside DatabaseInstance::Configure

@Tishj Tishj requested a review from hannes March 8, 2024 09:12
@hannes
Copy link
Member

hannes commented Mar 8, 2024

maybe the default should be 90% of available capacity on the drive to allow for some other stuff to happen as well

src/main/settings/settings.cpp Outdated Show resolved Hide resolved
src/main/config.cpp Outdated Show resolved Hide resolved
@Tishj Tishj marked this pull request as ready for review March 11, 2024 10:26
@github-actions github-actions bot marked this pull request as draft March 12, 2024 11:53
@Tishj Tishj marked this pull request as ready for review March 13, 2024 10:23
@github-actions github-actions bot marked this pull request as draft April 2, 2024 09:48
@Tishj Tishj marked this pull request as ready for review April 2, 2024 14:15
@github-actions github-actions bot marked this pull request as draft April 3, 2024 15:01
@Tishj Tishj marked this pull request as ready for review April 10, 2024 13:27
@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2024 08:50
@Tishj Tishj marked this pull request as ready for review April 11, 2024 08:54
@Tishj Tishj requested a review from Mytherin April 11, 2024 13:42
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good - one more minor comment

src/storage/temporary_file_manager.cpp Outdated Show resolved Hide resolved
@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 12, 2024 11:19
@Tishj Tishj marked this pull request as ready for review April 12, 2024 14:42
@Tishj Tishj requested a review from Mytherin April 16, 2024 16:25
@Mytherin Mytherin merged commit c07b645 into duckdb:main Apr 17, 2024
49 checks passed
@Mytherin
Copy link
Collaborator

Thanks! LGTM

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 1, 2024
Merge pull request duckdb/duckdb#10978 from Tishj/maximum_swap_space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants