Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readme: update Faster Beast Mode example #394

Merged
merged 3 commits into from
Dec 27, 2021
Merged

Conversation

anatolijd
Copy link
Contributor

Since s5cmd is all about speed, I found it appropriate to propose a faster example for find /path -type f |s5cmd run :)
While | xargs -I{} echo is correct and works fine for most cases, it may be slow for very large lists.

For example, I need to transfer directory with 226K very small files in a tree:

anatolij@~$ time find /var/vault/data -type f | xargs -I{} echo "mv {} s3://bucket/vault/data/{}" > vault-data-file.cmd

real	4m34.096s
user	0m3.884s
sys	0m20.396s

More than 4 minutes (test executed at EC2 m5.large), just because it needs to exec two commands on each iteration.

Replacing it with awk gives us:

anatolij@~$ time find /var/vault/data -type f | awk '{print "mv "$1" s3://bucket/vault-awk/data/"$1}' > vault-data-file.awk.cmd

real	0m1.324s
user	0m0.720s
sys	0m0.816s

Both transferred filesets were identical:

anatolij@~$ wc -l vault-data-file.cmd vault-data-file.awk.cmd
   226519 vault-data-file.cmd
   226519 vault-data-file.awk.cmd

anatolij@~$ sha256sum vault-data-file.cmd vault-data-file.awk.cmd
  86c980f89664cf1cef98c36d2f171054d4e9e923414c6376e31067b6b638ed1d  vault-data-file.cmd
  86c980f89664cf1cef98c36d2f171054d4e9e923414c6376e31067b6b638ed1d  vault-data-file.awk.cmd

In my case of 200K tiny files , | awk | s5cmd run approach was 3-5 times faster than | xargs echo| s5cmd run (300-400fps vs 900-1500fps).

BTW, thank you for the great tool!

@anatolijd anatolijd requested a review from a team as a code owner December 21, 2021 21:50
@anatolijd anatolijd requested review from aykutfarsak and sonmezonur and removed request for a team December 21, 2021 21:50
@igungor
Copy link
Member

igungor commented Dec 22, 2021

Hi,

Thank you for the improvement. It looks good. I think we can simply replace the current example with your proposal.

@anatolijd
Copy link
Contributor Author

I committed another change to simply replace the example.

@igungor igungor changed the title Faster Beast Mode example readme: update Faster Beast Mode example Dec 27, 2021
@igungor igungor merged commit 7f694fd into peak:master Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants