Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipe storage cp shall start data upload before traversing full source hierarchy #2574

Open
sidoruka opened this issue Mar 29, 2022 · 1 comment
Assignees
Labels
goal/dorado ✨ kind/enhancement New feature or request sys/cli Issues related to the pipe cli
Milestone

Comments

@sidoruka
Copy link
Contributor

sidoruka commented Mar 29, 2022

Background
At the moment, pipe storage cp/mv CLI command for --recursive operations runs (at least) in two phases:

  • Scan the source directory recursively and collect all the files/folders
  • Transfer the data collected

For huge filesystem hierarchies the scan process may take a lot of time (hours) and consume a lot of memory.

Approach
It would be great to change the scanning procedure in a more asynchronous fashion, e.g.:

  • Change the scanning process to collect files/folders in batches (e.g. 1000 paths)
  • Once we have a batch collected - start upload to the destination
  • In the meantime (while upload is in progress) collect the next batch of files
  • And so on...
@sidoruka sidoruka added kind/enhancement New feature or request sys/cli Issues related to the pipe cli state/draft Draft issues, that are lacking description or not ready for the implementation goal/dorado ✨ labels Mar 29, 2022
@sidoruka sidoruka added this to the v0.17 milestone Mar 29, 2022
@sidoruka sidoruka removed the state/draft Draft issues, that are lacking description or not ready for the implementation label Mar 30, 2022
@sidoruka
Copy link
Contributor Author

Backport to release/0.16

ekazachkova added a commit that referenced this issue Apr 5, 2022
ekazachkova added a commit that referenced this issue Apr 5, 2022
ekazachkova added a commit that referenced this issue Apr 5, 2022
ekazachkova added a commit that referenced this issue Apr 6, 2022
ekazachkova added a commit that referenced this issue Apr 11, 2022
ekazachkova added a commit that referenced this issue Apr 11, 2022
…g full source hierarchy - async batch collection for cp/mv operation
ekazachkova added a commit that referenced this issue Apr 11, 2022
ekazachkova added a commit that referenced this issue Apr 12, 2022
…g full source hierarchy - refactor cp/mv paging
ekazachkova added a commit that referenced this issue Apr 12, 2022
…g full source hierarchy - refactor cp/mv paging
ekazachkova added a commit that referenced this issue Jul 4, 2024
…g full source hierarchy (S3 provider) - cleanups
ekazachkova added a commit that referenced this issue Jul 4, 2024
…g full source hierarchy (S3 provider) - async batch
ekazachkova added a commit that referenced this issue Jul 4, 2024
…g full source hierarchy (S3 provider) - disable async batch by default
ekazachkova added a commit that referenced this issue Jul 4, 2024
…g full source hierarchy (S3 provider) - cleanup
ekazachkova added a commit that referenced this issue Jul 5, 2024
…g full source hierarchy (S3 provider) - cleanup
ekazachkova added a commit that referenced this issue Jul 5, 2024
…g full source hierarchy (S3 provider) - cleanup
ekazachkova added a commit that referenced this issue Jul 5, 2024
…g full source hierarchy (S3 provider) - cleanup
ekazachkova added a commit that referenced this issue Jul 8, 2024
…g full source hierarchy (S3 provider) - support local paths
ekazachkova added a commit that referenced this issue Jul 9, 2024
…g full source hierarchy (S3 provider) - cleanup
ekazachkova added a commit that referenced this issue Jul 9, 2024
…g full source hierarchy (S3 provider) - cleanup
SilinPavel pushed a commit that referenced this issue Jul 9, 2024
…g full source hierarchy (S3 and Local provider) (#2597)
ekazachkova added a commit that referenced this issue Jul 15, 2024
…g full source hierarchy (S3 and Local provider) (#2597)

(cherry picked from commit e73c60c)
SilinPavel pushed a commit that referenced this issue Jul 31, 2024
SilinPavel pushed a commit that referenced this issue Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
goal/dorado ✨ kind/enhancement New feature or request sys/cli Issues related to the pipe cli
Projects
None yet
Development

No branches or pull requests

2 participants