Efficient bootstrapping of data arrays? #9299
Unanswered
joshdorrington
asked this question in
Q&A
Replies: 1 comment 1 reply
-
It's usually better (as in: easier to respond to) to paste the code in markdown instead of attaching a file: ```python
# code
``` I've taken the liberty of editing your post. I can see two things that might be slow:
ix = np.concatenate([np.zeros(950, dtype=bool), np.ones(50, dtype=bool)], axis=0)
I'd expect (2) to have the biggest impact. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am working to make my big data analysis more efficient and I am facing a frequent bottleneck with bootstrap resampling which I'd like to improve. I think it is so slow because because of all the out-of-order memory accessing. I attach a simple test case showing what I mean. For a realistic dataset it takes about 3 seconds per bootstrap, which is about 20 minutes for 400 resamples. The rest of my pipeline runs in 10 minutes, so this is a major delay.
If I try using dask for this, I end up with massive graphs with thousands of layers, and that ends up being very inefficient. Any advice or clever tricks are much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions