-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latency improvements to Multi Term Aggregations #14993
Conversation
❌ Gradle check result for fb84412: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Like how you document the profiling results to explain this!
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Outdated
Show resolved
Hide resolved
Thanks @expani for the code changes and detailed explanation on PR - I do have some minor comments on refactoring mainly. Please add relevant comments/javadocs to help future developers understand minor optimization and utilities for various low level operations. Did we check performance on any data which has multi-valued fields as well? If not, let us propose a change in OSB for multi-valued fields in some workloads, in case we don't have any such available workloads. Also, let us iterate CI to green and check if we have a solid code coverage as well. |
Thanks for taking the time to review @bowenlan-amzn and @sandeshkr419 I will add required java docs and comments as suggested.
Checked with few multi-values fields only for testing and not from a performance perspective. Current CI seems to be failing due to 2 tests that have been reported to be flaky by multiple other folks.
Will check on the coverage of the existing tests and add any if required. |
@sandeshkr419 @bowenlan-amzn Made changes based on the previous comments. Verified the existing UTs are covering all the branches and code paths added with this PR. Please have a look, thanks. |
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
❌ Gradle check result for 734e927: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 734e927: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
@msfroh I had added some tests and replied to your comments. Build is finally passing after multiple flaky tests. Please have a look. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-14993-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e885aa93279342c9ab219ecf612d887ff8de8af6
# Push it to GitHub
git push --set-upstream origin backport/backport-14993-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…4993) * Avoid deep copy and other allocation improvements * Refactoring based on PR Comments and added JavaDocs * Added more comments * Added character for Triggering Jenkins build * Changes to cover collectZeroDocEntries method * Updated comment based on change in method's functionality * Added test to cover branches in collectZeroDocEntriesIfRequired * Rebased and resolved changelog conflict --------- Signed-off-by: expani <anijainc@amazon.com>
…4993) * Avoid deep copy and other allocation improvements * Refactoring based on PR Comments and added JavaDocs * Added more comments * Added character for Triggering Jenkins build * Changes to cover collectZeroDocEntries method * Updated comment based on change in method's functionality * Added test to cover branches in collectZeroDocEntriesIfRequired * Rebased and resolved changelog conflict --------- Signed-off-by: expani <anijainc@amazon.com>
…4993) * Avoid deep copy and other allocation improvements * Refactoring based on PR Comments and added JavaDocs * Added more comments * Added character for Triggering Jenkins build * Changes to cover collectZeroDocEntries method * Updated comment based on change in method's functionality * Added test to cover branches in collectZeroDocEntriesIfRequired * Rebased and resolved changelog conflict --------- Signed-off-by: expani <anijainc@amazon.com>
Description
This PR aims to introduce the following improvements :
Testing was done on a
c5.9xlarge
with 20GB of JVM heap and store type asmmapfs
to ensure any affect of EBS Latencies doesn't affect the result much. The same was verified usinglsof
on the OS pid to ensure all index files are m-mapped (mem
is the type in such cases )The numbers below are averaged after 20 iterations of each type of query with and w/o changes.
Sample Aggregation Query
Multi term aggregation goes through all the docs given by the collector of the filter query ( MatchAllDocs Query if no filter is present )
For every document given by the collector, it generates cartesian product of all the values for all the fields present in the aggregation here
A deep copy is generated for every composite key here which is eventually copied again ( only for the first time ) while adding to the bucket. This PR refactors the code to remove the need for a deep copy of every composite key.
We also perform a deep copy of the field values retrieved by Lucene here This is only essential for fields with multiple values in a document and can be avoided for fields with single value in a document.
Allocation Profiling
For Big5 Benchmark Process Name and Agent Id ( LOW CARDINALITY )
Deep Copy of composite key and for single valued fields takes around 25% of the overall allocations for a multi term aggregation query.
Collecting all the composite keys for every document in a list here also takes around 9% of the overall allocations.
For Big5 Benchmark Agent Name and Host Name ( HIGH CARDINALITY )
19% of overall allocations spent in deep copy of composite key
Collecting all composite keys taking around 9% same as before.
Also, for each loop to go over the field values for a document here contributes to 17% of overall allocations because of creating a new Iterator every time. Changed the same to use a regular for loop.
Testing
Will see the existing integs and UTs for Multi term aggregations to ensure if any corner cases are not covered.