Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector specific HNSW configuration #1675

Merged
merged 16 commits into from
Apr 10, 2023
Merged

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Apr 6, 2023

This adds support for vector specific HNSW configurations. This allows to set different HNSW parameters for different vectors in a multi-vector collection better fine tune performance.

In this example create collection request, modified_vector utilizes different HNSW parameters:

{
    "vectors": {
        "simple_vector": {
            "distance": "Dot",
            "size": 10
        },
        "modified_vector": {
            "distance": "Dot",
            "size": 300,
            "hnsw_config": {
                "m": 123,
                "ef_construct": 456
            }
        }
    },
    "hnsw_config": {
        "m": 1000,
        "ef_construct": 1000
    }
}

Tasks

  • Accept vector specific HNSW params via REST/gRPC, and validate these
  • Persist vector specific HNSW configs
  • Add vector specific HNSW config to segment config
  • Base vector specific config (diff) upon collection config (full)
  • Use vector specific HNSW config when building graphs
  • Update OpenAPI/gRPC files
  • Add basic integration test

After merge

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally using cargo fmt command prior to submission?
  3. Have you checked your code using cargo clippy command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee force-pushed the multi-vector-hnsw-config2 branch from ba97602 to bbfc4ee Compare April 6, 2023 16:35
@timvisee timvisee self-assigned this Apr 6, 2023
@timvisee timvisee force-pushed the multi-vector-hnsw-config2 branch from c9f95a0 to 1c06bb0 Compare April 7, 2023 08:31
@timvisee timvisee changed the title Draft: add vector specific HNSW configuration Add vector specific HNSW configuration Apr 7, 2023
@timvisee timvisee marked this pull request as ready for review April 7, 2023 09:12
@timvisee timvisee requested review from generall and agourlay April 7, 2023 09:12
@timvisee timvisee force-pushed the multi-vector-hnsw-config2 branch from beb775e to a36b29c Compare April 10, 2023 14:08
generall added a commit that referenced this pull request Apr 11, 2023
* Validate VectorConfig/VectorParams, remove obsolete validation

* Add HNSW config diff to vector parameters

* Validate params in collection config

* Add HNSW config to segment vector data config

* Add VectorsConfig params iterator for more elegant conversions

* Prefer vector HNSW config over collection config for building HNSW index

* Base segment vector param HNSW config on collection config

* General improvements

* Rewrite HNSW ef_construct extract function to also consider vector configs

* Update OpenAPI specification

* Add test to check if vector specific HNSW config is persisted

* review changes

* review changes

* Regenerate gRPC docs

* Fix test on Windows

* Regenerate OpenAPI specification

---------

Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
@generall generall mentioned this pull request Apr 19, 2023
8 tasks
@agourlay agourlay deleted the multi-vector-hnsw-config2 branch July 12, 2023 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants