Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USearch HNSW index for ANN search #53447

Merged
merged 22 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add: USearch
  • Loading branch information
davvard committed Aug 15, 2023
commit 48c62fd75e2977ee9b23ceea070782188536c2ba
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -347,3 +347,6 @@
[submodule "contrib/incbin"]
path = contrib/incbin
url = https://github.com/graphitemaster/incbin.git
[submodule "contrib/usearch"]
path = contrib/usearch
url = https://github.com/unum-cloud/usearch.git
1 change: 1 addition & 0 deletions contrib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ if (ARCH_S390X)
add_contrib(crc32-s390x-cmake crc32-s390x)
endif()
add_contrib (annoy-cmake annoy)
add_contrib (usearch-cmake usearch)
add_contrib (xxHash-cmake xxHash)

add_contrib (libbcrypt-cmake libbcrypt)
Expand Down
1 change: 1 addition & 0 deletions contrib/usearch
Submodule usearch added at 387b78
15 changes: 15 additions & 0 deletions contrib/usearch-cmake/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
option(ENABLE_USEARCH "Enable USearch (Approximate Neighborhood Search, HNSW) support" ${ENABLE_LIBRARIES})

if (NOT ENABLE_USEARCH)
message (STATUS "Not using usearch")
return()
endif()

set(USEARCH_PROJECT_DIR "${ClickHouse_SOURCE_DIR}/contrib/usearch")
set(USEARCH_SOURCE_DIR "${USEARCH_PROJECT_DIR}/include")

add_library(_usearch INTERFACE)
target_include_directories(_usearch SYSTEM INTERFACE ${USEARCH_PROJECT_DIR}/fp16/include ${USEARCH_PROJECT_DIR}/robin-map/include ${USEARCH_PROJECT_DIR}/simsimd/include ${USEARCH_SOURCE_DIR})

add_library(ch_contrib::usearch ALIAS _usearch)
target_compile_definitions(_usearch INTERFACE ENABLE_USEARCH)
41 changes: 41 additions & 0 deletions docs/en/engines/table-engines/mergetree-family/annindexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ was specified for ANN indexes, the default value is 100 million.

- [Annoy](/docs/en/engines/table-engines/mergetree-family/annindexes.md#annoy-annoy)

- [USearch](/docs/en/engines/table-engines/mergetree-family/annindexes.md#usearch-usearch)
rschu1ze marked this conversation as resolved.
Show resolved Hide resolved

## Annoy {#annoy}

Annoy indexes are currently experimental, to use them you first need to `SET allow_experimental_annoy_index = 1`. They are also currently
Expand Down Expand Up @@ -216,3 +218,42 @@ ORDER BY L2Distance(vectors, Point)
LIMIT N
SETTINGS annoy_index_search_k_nodes=100;
```


## USearch {#usearch}

USearch indexes are currently experimental, to use them you first need to `SET allow_experimental_usearch_index = 1`.

This type of ANN index implements [the HNSW algorithm](https://github.com/unum-cloud/usearch).
rschu1ze marked this conversation as resolved.
Show resolved Hide resolved

Syntax to create an USearch index over an [Array](../../../sql-reference/data-types/array.md) column:

```sql
CREATE TABLE table_with_usearch_index
(
id Int64,
vectors Array(Float32),
INDEX [ann_index_name] vectors TYPE usearch([Distance]) [GRANULARITY N]
rschu1ze marked this conversation as resolved.
Show resolved Hide resolved
)
ENGINE = MergeTree
ORDER BY id;
```

Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:

```sql
CREATE TABLE table_with_usearch_index
(
id Int64,
vectors Tuple(Float32[, Float32[, ...]]),
INDEX [ann_index_name] vectors TYPE usearch([Distance]) [GRANULARITY N]
)
ENGINE = MergeTree
ORDER BY id;
```

USearch currently supports two distance functions:
- `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space
([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)).
- `cosineDistance`, also called cosine similarity, is the cosine of the angle between two (non-zero) vectors
([Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity)).
rschu1ze marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,10 @@ if (TARGET ch_contrib::annoy)
dbms_target_link_libraries(PUBLIC ch_contrib::annoy)
endif()

if (TARGET ch_contrib::usearch)
dbms_target_link_libraries(PUBLIC ch_contrib::usearch)
endif()

if (TARGET ch_rust::skim)
dbms_target_include_directories(PRIVATE $<TARGET_PROPERTY:ch_rust::skim,INTERFACE_INCLUDE_DIRECTORIES>)
dbms_target_link_libraries(PUBLIC ch_rust::skim)
Expand Down
1 change: 1 addition & 0 deletions src/Core/Settings.h
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,7 @@ class IColumn;
M(Bool, allow_experimental_hash_functions, false, "Enable experimental hash functions", 0) \
M(Bool, allow_experimental_object_type, false, "Allow Object and JSON data types", 0) \
M(Bool, allow_experimental_annoy_index, false, "Allows to use Annoy index. Disabled by default because this feature is experimental", 0) \
M(Bool, allow_experimental_usearch_index, false, "Allows to use USearch index. Disabled by default because this feature is experimental", 0) \
rschu1ze marked this conversation as resolved.
Show resolved Hide resolved
M(UInt64, max_limit_for_ann_queries, 1'000'000, "SELECT queries with LIMIT bigger than this setting cannot use ANN indexes. Helps to prevent memory overflows in ANN search indexes.", 0) \
M(Int64, annoy_index_search_k_nodes, -1, "SELECT queries search up to this many nodes in Annoy indexes.", 0) \
M(Bool, throw_on_unsupported_query_inside_transaction, true, "Throw exception if unsupported query is used inside transaction", 0) \
Expand Down
Loading
Loading