-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: speed up search iterator stage 1 #37947
Conversation
17da513
to
a6bd30c
Compare
@PwzXxm cpp-unit-test check failed, comment |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37947 +/- ##
==========================================
- Coverage 82.88% 81.11% -1.78%
==========================================
Files 1089 1383 +294
Lines 169079 195578 +26499
==========================================
+ Hits 140147 158638 +18491
- Misses 23336 31367 +8031
+ Partials 5596 5573 -23
|
rerun cpp-unit-test |
@PwzXxm cpp-unit-test check failed, comment |
00a762f
to
97467bf
Compare
@PwzXxm E2e jenkins job failed, comment |
@PwzXxm go-sdk check failed, comment |
@PwzXxm cpp-unit-test check failed, comment |
97467bf
to
e74fbfd
Compare
@PwzXxm cpp-unit-test check failed, comment |
@PwzXxm E2e jenkins job failed, comment |
e74fbfd
to
de74b0c
Compare
@PwzXxm cpp-unit-test check failed, comment |
/hold |
/unhold |
@PwzXxm cpp-unit-test check failed, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: just some questions
heap.pop(); | ||
|
||
// last_bound may change between NextBatch calls, discard any invalid results | ||
if (!IsValid(cur_rst, last_bound, radius, range_filter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so v2 iterator will not return better results compared to former iterations page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this stage, the next one will try to take care of this.
const float dist = result.first; | ||
const bool is_valid = | ||
!last_bound.has_value() || dist > last_bound.value(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: no need to consider the positive or negative metrics for dist and last_bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distances are converted when entering this class, no need to worry about it here
@@ -124,6 +125,19 @@ SearchOnGrowing(const segcore::SegmentGrowingImpl& segment, | |||
|
|||
// step 3: brute force search where small indexing is unavailable | |||
auto vec_ptr = record.get_data_base(vecfield_id); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cached itrator will be created every time, so what is 'cached'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will introduce a pool of results in the next stage, as commented in https://github.com/milvus-io/milvus/pull/37947/files/9f6b88743198a575eb84cb427bcd41a7631676b7#diff-7344957165f4632a9363de767323618b7db0bd2d0f7cf7165965d3fb2612f18b. This class tries to provide a framework for the further implementation. If you think this name is confusing, I will change the naming if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to bother, just follow your scheme
search_result.distances_.resize(nq_ * batch_size_); | ||
|
||
for (size_t query_idx = 0; query_idx < nq_; ++query_idx) { | ||
auto rst = GetBatchedNextResults(query_idx, search_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: seems that offsets and distances data retrieved are copied twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distance-id pairs need to be sorted before copy to the search_result
. Knowhere needs to provide the ability to give batched results via iterator to eliminate this copy.
rerun go-sdk |
0552b7d
to
9016c4a
Compare
@PwzXxm go-sdk check failed, comment |
rerun go-sdk |
9016c4a
to
9981e47
Compare
@PwzXxm go-sdk check failed, comment |
rerun go-sdk |
4167f19
to
00d8642
Compare
@PwzXxm go-sdk check failed, comment |
91556bb
to
bf22e21
Compare
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
bf22e21
to
3044496
Compare
@PwzXxm cpp-unit-test check failed, comment |
rerun cpp-unit-test |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: congqixia, PwzXxm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
issue: #37548