Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add export local volume data route #48839

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

jarqvi
Copy link

@jarqvi jarqvi commented Nov 8, 2024

Implement a new route in the Docker API that enables users to export the contents of local volumes as a .tar archive. This feature offers a convenient way to generate compressed backups of local volume data, making it easier to store, transfer, and restore volume contents efficiently.

Signed-off-by: MohammadHasan Akbari <jarqvi.jarqvi@gmail.com>
@thaJeztah thaJeztah added area/api kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny impact/api impact/changelog area/volumes labels Nov 8, 2024
@thaJeztah
Copy link
Member

Thanks! I think this would address;

I recall there were still some discussions to be had around those; probably also in relation to #48798

I'll make sure this gets discussed in the next maintainers call

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it works in the optimistic case, it's potentially very unsafe especially for backup purposes.

The issue with this approach is that the archive creation is not atomic - the filesystem content can change during the operation.

Consider the following filesystem at the time the archive.Tar is called:

/data
/data/a/...
/data/b/...
...
/data/z/...

The filepath.WalkDir will walk the directory in lexographical order. Obviously each directory needs some time to be processed.

This is fine, as long as we're sure that the filesystem doesn't change, but what if there's a container running that moves the /data/z/important-file into /data/a/important-file?

If the walk would already finish processing the /data/a and the important-file was already moved, by the time the walk starts processing the /data/z it will already be in the /data/a directory, meaning that it will be missing from the final archive.

This is unacceptable for users that would like to use it to backup a volume.

Unfortunately with the local volume driver, I don't think we can provide a solution for this as we can't effectively snapshot the content of the volume, unless we want to docker pause all containers using this volume during the export.

Also, if such "unsafe" solution is acceptable for the user, it's already possible with something like: docker run -it -v <volume>:/v alpine tar -c /v | ..., so I don't see a need to implement it on the engine side.

@jarqvi
Copy link
Author

jarqvi commented Nov 8, 2024

This is unacceptable for users that would like to use it to backup a volume.

Well, can't we use rsync in the engine side?

Edit:

unless we want to docker pause all containers using this volume during the export.

Is pausing containers problematic?
I think we also pause the container in docker commit.

Signed-off-by: MohammadHasan Akbari <jarqvi.jarqvi@gmail.com>
@thaJeztah
Copy link
Member

Is pausing containers problematic?

For volumes, this would likely mean "all containers that use the volume", in addition to preventing the volume to be used by new containers while the export is in process. We may need to look as well at how the paths are traversed / walked; if this is not happening within the container's mount namespace, we must prevent any path outside of the volume to be accessible (we've had some fun situations with that on docker copy), otherwise if code is following symlinks, a (dangling) symlink in a volume may get resolved to paths outside of its scope.

Signed-off-by: MohammadHasan Akbari <jarqvi.jarqvi@gmail.com>
Signed-off-by: MohammadHasan Akbari <jarqvi.jarqvi@gmail.com>
@jarqvi
Copy link
Author

jarqvi commented Nov 15, 2024

For volumes, this would likely mean "all containers that use the volume", in addition to preventing the volume to be used by new containers while the export is in process. We may need to look as well at how the paths are traversed / walked; if this is not happening within the container's mount namespace

I made some changes regarding this note.

we must prevent any path outside of the volume to be accessible (we've had some fun situations with that on docker copy), otherwise if code is following symlinks, a (dangling) symlink in a volume may get resolved to paths outside of its scope.

I think the tar tool, when creating an archive (compression), does not follow symlinks (symbolic links). Instead, it stores the link itself rather than the file or directory it points to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api area/volumes impact/api impact/changelog kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants