Open
Description
Hi,
Me and my friend have been reading the code for a while and we were looking for some ideas for contributing.
@ankane, you mentioned product quantization in #27. Is this still an issue? We would like to work on it if it looks like a useful feature.
I have some questions about it and would like to discuss design choices in this issue as we start implementing this feature.
Current questions I have:
- Would it make more sense if pgvector supports a new index (e.g. IVFPQ like Faiss) to achieve product quantization or just adding a new vector type? This is much harder to do without adding a new index since we have to store the centroids for each subvector. It might make more sense to have a new index type for ivf + product quantization.
- Do you think the subvector type might help in the internal implementation? I think it can help to get a part of the vector in the process of product quantization.
- Please let me know of any specific implementations you find more performant for IVFPQ. I have provided a list of resources that I'm reading to fully understand how people have implemented it in the past.
Some Resources
- Faiss IVFPQ implementation
- The PQ paper
- A useful blog post and some IVFPQ optimizations
- Billion-scale Approximate Nearest Neighbor Search
Metadata
Assignees
Labels
No labels