Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use constant_score query for match_only_text #16964

Conversation

msfroh
Copy link
Collaborator

@msfroh msfroh commented Jan 6, 2025

Description

In some cases, when we create a term query over a match_only_text field, it may still try to compute scores, which prevents early termination. We should always use a constant score query when querying match_only_text, since we don't have the statistics required to compute scores.

Related Issues

N/A

We've seen benchmark latency on the Big5 query-string-on-message operation, which we can attribute to this.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

In some cases, when we create a term query over a `match_only_text`
field, it may still try to compute scores, which prevents early
termination. We should *always* use a constant score query when
querying `match_only_text`, since we don't have the statistics
required to compute scores.

Signed-off-by: Michael Froh <froh@amazon.com>
@msfroh
Copy link
Collaborator Author

msfroh commented Jan 6, 2025

@rishabhmaurya -- you may be interested in this

@msfroh msfroh added v2.19.0 Issues and PRs related to version 2.19.0 backport 2.x Backport to 2.x branch labels Jan 6, 2025
Signed-off-by: Michael Froh <froh@amazon.com>
Copy link
Contributor

@rishabhmaurya rishabhmaurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. LGTM!

Copy link
Contributor

github-actions bot commented Jan 6, 2025

❌ Gradle check result for 399ddaf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Michael Froh <froh@amazon.com>
Copy link
Contributor

github-actions bot commented Jan 7, 2025

✅ Gradle check result for 81a665c: SUCCESS

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.21%. Comparing base (4a53ff2) to head (81a665c).
Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16964      +/-   ##
============================================
- Coverage     72.32%   72.21%   -0.12%     
+ Complexity    65310    65246      -64     
============================================
  Files          5299     5299              
  Lines        303534   303536       +2     
  Branches      43941    43941              
============================================
- Hits         219527   219187     -340     
- Misses        66021    66391     +370     
+ Partials      17986    17958      -28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.21%. Comparing base (4a53ff2) to head (81a665c).
Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16964      +/-   ##
============================================
- Coverage     72.32%   72.21%   -0.12%     
+ Complexity    65310    65246      -64     
============================================
  Files          5299     5299              
  Lines        303534   303536       +2     
  Branches      43941    43941              
============================================
- Hits         219527   219187     -340     
- Misses        66021    66391     +370     
+ Partials      17986    17958      -28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msfroh msfroh merged commit 0b36599 into opensearch-project:main Jan 7, 2025
36 checks passed
@msfroh msfroh deleted the constant_score_for_match_only_term_query branch January 7, 2025 00:24
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16964-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0b365998ed6e4f537dbdf7983a077bc53e785bb9
# Push it to GitHub
git push --set-upstream origin backport/backport-16964-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-16964-to-2.x.

msfroh added a commit to msfroh/OpenSearch that referenced this pull request Jan 7, 2025
…ct#16964)

In some cases, when we create a term query over a `match_only_text`
field, it may still try to compute scores, which prevents early
termination. We should *always* use a constant score query when
querying `match_only_text`, since we don't have the statistics
required to compute scores.

---------

Signed-off-by: Michael Froh <froh@amazon.com>
(cherry picked from commit 0b36599)
@msfroh
Copy link
Collaborator Author

msfroh commented Jan 7, 2025

Manual backport PR: #16969

msfroh added a commit that referenced this pull request Jan 8, 2025
In some cases, when we create a term query over a `match_only_text`
field, it may still try to compute scores, which prevents early
termination. We should *always* use a constant score query when
querying `match_only_text`, since we don't have the statistics
required to compute scores.

---------

Signed-off-by: Michael Froh <froh@amazon.com>
(cherry picked from commit 0b36599)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed v2.19.0 Issues and PRs related to version 2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants