Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Only traverse necessary directories #81

Merged
merged 3 commits into from
Oct 18, 2022

Conversation

larsoner
Copy link
Collaborator

I have been trying to download https://openneuro.org/datasets/ds004024/, and traversing the directory structure takes over a minute locally (!) for example for:

openneuro-py download --dataset=ds004024 --target_dir=$PWD --include="sub-CON001/*.eeg" --include="sub-CON001/*.tsv"

or

openneuro-py download --dataset=ds004024 --target_dir=$PWD --include="sub-CON001"

With these changes, either of these requires less than 10 seconds to traverse, because they are smart enough to ignore all other sub-* directories.

I'm not 100% sure the logic here is correct, but I do think we want something like this eventually to make openneuro-py more usable.

@larsoner
Copy link
Collaborator Author

This one is ready to go from my end @hoechenberger. I think the logic works and we can improve it later if needed. For now I think it at least helps reduce the amount of dirs we traverse

@larsoner
Copy link
Collaborator Author

larsoner commented Oct 18, 2022

In #80 the test timings are

23.10s call     openneuro/tests/test_download.py::test_download[ds000117-None-sub-16/ses-meg-*.fif]
7.45s call     openneuro/tests/test_download.py::test_doi_handling
7.08s call     openneuro/tests/test_download.py::test_download[ds000246-1.0.0-sub-0001/anat-exclude0]
5.13s call     openneuro/tests/test_download.py::test_resume_download
3.67s call     openneuro/tests/test_download.py::test_ds000248

And in this PR you can see they've all sped up

15.48s call     openneuro/tests/test_download.py::test_download[ds000117-None-sub-16/ses-meg-*.fif]
4.91s call     openneuro/tests/test_download.py::test_resume_download
2.69s call     openneuro/tests/test_download.py::test_download[ds000246-1.0.0-sub-0001/anat-exclude0]
2.24s call     openneuro/tests/test_download.py::test_doi_handling

Copy link
Owner

@hoechenberger hoechenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @larsoner!

@hoechenberger hoechenberger merged commit 04364cd into hoechenberger:main Oct 18, 2022
@larsoner larsoner deleted the traverse branch October 18, 2022 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants