Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: speed up search iterator stage 1 #37947

Merged
merged 1 commit into from
Dec 26, 2024

Conversation

PwzXxm
Copy link
Contributor

@PwzXxm PwzXxm commented Nov 22, 2024

issue: #37548

@sre-ci-robot sre-ci-robot added area/dependency Pull requests that update a dependency file size/XXL Denotes a PR that changes 1000+ lines. labels Nov 22, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Nov 22, 2024
Copy link
Contributor

mergify bot commented Nov 22, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 88.62434% with 43 lines in your changes missing coverage. Please review.

Project coverage is 81.11%. Comparing base (9c3f59d) to head (3044496).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/query/SearchOnSealed.cpp 36.84% 12 Missing ⚠️
internal/core/src/query/CachedSearchIterator.cpp 95.90% 7 Missing ⚠️
internal/proxy/search_util.go 90.14% 5 Missing and 2 partials ⚠️
internal/core/src/query/SearchBruteForce.cpp 80.95% 4 Missing ⚠️
internal/core/src/query/SearchOnGrowing.cpp 42.85% 4 Missing ⚠️
internal/core/src/query/SearchOnIndex.cpp 42.85% 4 Missing ⚠️
internal/core/src/index/Utils.cpp 88.23% 2 Missing ⚠️
internal/core/src/query/PlanProto.cpp 75.00% 2 Missing ⚠️
internal/core/src/query/CachedSearchIterator.h 95.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #37947      +/-   ##
==========================================
- Coverage   82.88%   81.11%   -1.78%     
==========================================
  Files        1089     1383     +294     
  Lines      169079   195578   +26499     
==========================================
+ Hits       140147   158638   +18491     
- Misses      23336    31367    +8031     
+ Partials     5596     5573      -23     
Components Coverage Δ
Client 78.26% <ø> (ø)
Core 69.43% <86.81%> (∅)
Go 83.10% <93.33%> (+0.04%) ⬆️
Files with missing lines Coverage Δ
internal/core/src/common/QueryInfo.h 100.00% <100.00%> (ø)
internal/core/src/index/Utils.h 88.88% <ø> (ø)
internal/core/src/index/VectorDiskIndex.cpp 76.96% <100.00%> (ø)
internal/core/src/index/VectorMemIndex.cpp 64.80% <100.00%> (ø)
internal/proxy/proxy.go 72.18% <100.00%> (+1.49%) ⬆️
internal/proxy/task.go 80.69% <ø> (ø)
internal/proxy/task_search.go 76.34% <100.00%> (+0.54%) ⬆️
internal/core/src/query/CachedSearchIterator.h 95.00% <95.00%> (ø)
internal/core/src/index/Utils.cpp 40.90% <88.23%> (ø)
internal/core/src/query/PlanProto.cpp 88.44% <75.00%> (ø)
... and 6 more

... and 322 files with indirect coverage changes

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 25, 2024

rerun cpp-unit-test

Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@PwzXxm PwzXxm force-pushed the search_iter_v2_s1 branch 2 times, most recently from 00a762f to 97467bf Compare November 25, 2024 11:30
Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 27, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 27, 2024

@PwzXxm E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 29, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 29, 2024

/hold

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 29, 2024

/unhold
Rename iterator token to iterator id

Copy link
Contributor

mergify bot commented Nov 29, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

@MrPresent-Han MrPresent-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: just some questions

heap.pop();

// last_bound may change between NextBatch calls, discard any invalid results
if (!IsValid(cur_rst, last_bound, radius, range_filter)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so v2 iterator will not return better results compared to former iterations page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this stage, the next one will try to take care of this.

const float dist = result.first;
const bool is_valid =
!last_bound.has_value() || dist > last_bound.value();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: no need to consider the positive or negative metrics for dist and last_bound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distances are converted when entering this class, no need to worry about it here

@@ -124,6 +125,19 @@ SearchOnGrowing(const segcore::SegmentGrowingImpl& segment,

// step 3: brute force search where small indexing is unavailable
auto vec_ptr = record.get_data_base(vecfield_id);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cached itrator will be created every time, so what is 'cached'?

Copy link
Contributor Author

@PwzXxm PwzXxm Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will introduce a pool of results in the next stage, as commented in https://github.com/milvus-io/milvus/pull/37947/files/9f6b88743198a575eb84cb427bcd41a7631676b7#diff-7344957165f4632a9363de767323618b7db0bd2d0f7cf7165965d3fb2612f18b. This class tries to provide a framework for the further implementation. If you think this name is confusing, I will change the naming if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to bother, just follow your scheme

search_result.distances_.resize(nq_ * batch_size_);

for (size_t query_idx = 0; query_idx < nq_; ++query_idx) {
auto rst = GetBatchedNextResults(query_idx, search_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: seems that offsets and distances data retrieved are copied twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distance-id pairs need to be sorted before copy to the search_result. Knowhere needs to provide the ability to give batched results via iterator to eliminate this copy.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 13, 2024

rerun go-sdk

@mergify mergify bot added the ci-passed label Dec 13, 2024
@mergify mergify bot removed the ci-passed label Dec 18, 2024
Copy link
Contributor

mergify bot commented Dec 18, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 18, 2024

rerun go-sdk

Copy link
Contributor

mergify bot commented Dec 25, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 25, 2024

rerun go-sdk

@PwzXxm PwzXxm force-pushed the search_iter_v2_s1 branch 2 times, most recently from 4167f19 to 00d8642 Compare December 25, 2024 07:35
Copy link
Contributor

mergify bot commented Dec 25, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

@PwzXxm PwzXxm force-pushed the search_iter_v2_s1 branch 2 times, most recently from 91556bb to bf22e21 Compare December 25, 2024 10:13
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
Copy link
Contributor

mergify bot commented Dec 25, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 25, 2024

rerun cpp-unit-test

@mergify mergify bot added the ci-passed label Dec 25, 2024
@MrPresent-Han
Copy link
Contributor

/lgtm

@congqixia
Copy link
Contributor

/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia, PwzXxm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 26, 2024

/unhold

@sre-ci-robot sre-ci-robot merged commit 85f462b into milvus-io:master Dec 26, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/compilation area/dependency Pull requests that update a dependency file ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants