-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PCA Dictionary Indexing #638
base: develop
Are you sure you want to change the base?
Conversation
HNSW not viable
10x improvement over naive pyopencl
Hi @ZacharyVarley, thank you for opening this PR! I modified your added tutorial to compare DI with PCA DI on my 6 yr old laptop. PCA DI is about 2x faster than standard DI, although it uses much more memory, which I hope we can look at. Please let me know when you'd like feedback on the PR! |
Please let me also know whether you have any questions regarding tests, docs, dependency control etc. You might already be aware of our contributing guide. |
The memory footprint is expected to be 4x larger than holding the uint8 dictionary patterns, as the dictionary patterns must be cast to 32 bit floats in order to calculate the covariance matrix using LAPACK for subsequent eigenvector decomposition. I realize now that this should be avoided by chunking the dictionary covariance matrix calculation. Further, this approach is most beneficial when the dictionary and/or pattern dimensions are large (roughly > 100,000 and > 60x60). This might be why only a 2x speedup was observed. I am hoping to have time to patch up everything in the PR this week. |
I haven't looked in detail at the implementation, but just assumed that the dictionary had to be in memory for PCA DI. If not, that's great! You should be able to map the calculation across chunks of the dictionary using From my experience, it is impossible to find the optimum between speed and memory use of DI across different platforms and sizes of dataset and dictionary. We therefore give the user the option to control how many experimental ( |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Description of the change
Add fast PCA dictionary indexing
Progress of the PR
For reviewers
__init__.py
.section in
CHANGELOG.rst
.release.py
,.zenodo.json
and.all-contributorsrc
with the table regenerated.