Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: After transferring replicas to new query nodes, the segments still remain on the old query nodes. #32862

Open
1 task done
zhuwenxing opened this issue May 8, 2024 · 10 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20240507-53874ce2-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

                                                                                   Resource Group Info                                                                                    
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                     ┃ Capacity ┃ Available Node ┃ Loaded Replica           ┃ Outgoing Node ┃ Incoming Node ┃ Request ┃ Limit ┃ Nodes                                              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ __default_resource_group │ 1000000  │ 3              │ {'Checker__CP8UzgWo': 1} │ {}            │ {}            │ 0       │ 10    │ rg-test-234937-milvus-querynode-0-855647b8bd-lc2mn │
│                          │          │                │                          │               │               │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2 │
│                          │          │                │                          │               │               │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-ljcg9 │
│ rg_85hmeHMP              │ 2        │ 2              │ {}                       │ {}            │ {}            │ 2       │ 2     │ rg-test-234937-milvus-querynode-0-855647b8bd-98w54 │
│                          │          │                │                          │               │               │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-972bw │
└──────────────────────────┴──────────┴────────────────┴──────────────────────────┴───────────────┴───────────────┴─────────┴───────┴────────────────────────────────────────────────────┘
                                                     Checker__CP8UzgWo Segment Distribution Info                                                      
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Segment ID         ┃ Collection ID      ┃ Partition ID       ┃ Num Rows ┃ State ┃ Node ID ┃ Node Name                                              ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 449618054847598736 │ 449618054847398552 │ 449618054847398553 │ 627      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598735 │ 449618054847398552 │ 449618054847398553 │ 608      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598732 │ 449618054847398552 │ 449618054847398553 │ 606      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598733 │ 449618054847398552 │ 449618054847398553 │ 562      │ 3     │ [9]     │ ['rg-test-234937-milvus-querynode-0-855647b8bd-ljcg9'] │
│ 449618054847598734 │ 449618054847398552 │ 449618054847398553 │ 597      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
└────────────────────┴────────────────────┴────────────────────┴──────────┴───────┴─────────┴────────────────────────────────────────────────────────┘
                                                                                              Resource Group Info                                                                                               
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                     ┃ Capacity ┃ Available Node ┃ Loaded Replica           ┃ Outgoing Node            ┃ Incoming Node            ┃ Request ┃ Limit ┃ Nodes                                              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ __default_resource_group │ 1000000  │ 3              │ {}                       │ {}                       │ {'Checker__CP8UzgWo': 3} │ 0       │ 10    │ rg-test-234937-milvus-querynode-0-855647b8bd-lc2mn │
│                          │          │                │                          │                          │                          │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2 │
│                          │          │                │                          │                          │                          │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-ljcg9 │
│ rg_85hmeHMP              │ 2        │ 2              │ {'Checker__CP8UzgWo': 1} │ {'Checker__CP8UzgWo': 3} │ {}                       │ 2       │ 2     │ rg-test-234937-milvus-querynode-0-855647b8bd-98w54 │
│                          │          │                │                          │                          │                          │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-972bw │
└──────────────────────────┴──────────┴────────────────┴──────────────────────────┴──────────────────────────┴──────────────────────────┴─────────┴───────┴────────────────────────────────────────────────────┘
                                                     Checker__CP8UzgWo Segment Distribution Info                                                      
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Segment ID         ┃ Collection ID      ┃ Partition ID       ┃ Num Rows ┃ State ┃ Node ID ┃ Node Name                                              ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 449618054847598736 │ 449618054847398552 │ 449618054847398553 │ 627      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598733 │ 449618054847398552 │ 449618054847398553 │ 562      │ 3     │ [9]     │ ['rg-test-234937-milvus-querynode-0-855647b8bd-ljcg9'] │
│ 449618054847598735 │ 449618054847398552 │ 449618054847398553 │ 608      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598732 │ 449618054847398552 │ 449618054847398553 │ 606      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598734 │ 449618054847398552 │ 449618054847398553 │ 597      │ 3     │ [11]    │ ['rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
└────────────────────┴────────────────────┴────────────────────┴──────────┴───────┴─────────┴────────────────────────────────────────────────────────┘
[2024-05-08 16:31:58 - INFO - ci_test]: SearchChecker, succ_rate: 1.00, total: 120, average_time: 0.5254, max_time: 0.8168, min_time: 0.2775 (checker.py:439)
                                                                                   Resource Group Info                                                                                    
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                     ┃ Capacity ┃ Available Node ┃ Loaded Replica           ┃ Outgoing Node ┃ Incoming Node ┃ Request ┃ Limit ┃ Nodes                                              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ __default_resource_group │ 1000000  │ 3              │ {}                       │ {}            │ {}            │ 0       │ 10    │ rg-test-234937-milvus-querynode-0-855647b8bd-lc2mn │
│                          │          │                │                          │               │               │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2 │
│                          │          │                │                          │               │               │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-ljcg9 │
│ rg_85hmeHMP              │ 2        │ 2              │ {'Checker__CP8UzgWo': 1} │ {}            │ {}            │ 2       │ 2     │ rg-test-234937-milvus-querynode-0-855647b8bd-972bw │
│                          │          │                │                          │               │               │         │       │ rg-test-234937-milvus-querynode-0-855647b8bd-98w54 │
└──────────────────────────┴──────────┴────────────────┴──────────────────────────┴───────────────┴───────────────┴─────────┴───────┴────────────────────────────────────────────────────┘
                                                                                Checker__CP8UzgWo Segment Distribution Info                                                                                 
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Segment ID         ┃ Collection ID      ┃ Partition ID       ┃ Num Rows ┃ State ┃ Node ID ┃ Node Name                                                                                                    ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 449618054847598733 │ 449618054847398552 │ 449618054847398553 │ 562      │ 3     │ [2, 9]  │ ['rg-test-234937-milvus-querynode-0-855647b8bd-98w54', 'rg-test-234937-milvus-querynode-0-855647b8bd-ljcg9'] │
│ 449618054847598735 │ 449618054847398552 │ 449618054847398553 │ 608      │ 3     │ [2, 11] │ ['rg-test-234937-milvus-querynode-0-855647b8bd-98w54', 'rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598736 │ 449618054847398552 │ 449618054847398553 │ 627      │ 3     │ [2, 11] │ ['rg-test-234937-milvus-querynode-0-855647b8bd-98w54', 'rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598734 │ 449618054847398552 │ 449618054847398553 │ 597      │ 3     │ [2, 11] │ ['rg-test-234937-milvus-querynode-0-855647b8bd-98w54', 'rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
│ 449618054847598732 │ 449618054847398552 │ 449618054847398553 │ 606      │ 3     │ [2, 11] │ ['rg-test-234937-milvus-querynode-0-855647b8bd-98w54', 'rg-test-234937-milvus-querynode-0-855647b8bd-hb9v2'] │
└────────────────────┴────────────────────┴────────────────────┴──────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Expected Behavior

segments on the old resource group were offloaded

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

transfer replicas happened in 08:31
querynode memory usage in source rg
image

querynode memory usage in target rg
image

After the transfer replicase occurred, the memory of the target RG's query node increased, but the source RG's query node did not decrease.

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 8, 2024
@zhuwenxing
Copy link
Contributor Author

/assign @chyezh

@xiaofan-luan
Copy link
Collaborator

I think this is async transfer.
you may need a while to finish the balance

@zhuwenxing
Copy link
Contributor Author

image

In another test, the memory of the source RG experienced some decline, but according to the segment distribution obtained through the interface utility.get_query_segment_info(collection_name), it was found that the segments still existed on the query node of the source RG.

image

@yanliang567
Copy link
Contributor

/assign @weiliu1031

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 8, 2024
@yanliang567 yanliang567 added this to the 2.4.2 milestone May 8, 2024
@yanliang567 yanliang567 removed their assignment May 8, 2024
@weiliu1031
Copy link
Contributor

same problems with #32901, should be fixed by #32929

@weiliu1031
Copy link
Contributor

/assign @zhuwenxing

@weiliu1031
Copy link
Contributor

please verify this with latest image

@zhuwenxing
Copy link
Contributor Author

test job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/resource_group_test/detail/resource_group_test/8/pipeline

During this test, the information obtained through the get query segment info interface indicates that after the transfer, the segment is only present in the target's rg. However, in the Grafana monitoring, it shows that the segment still exists in the source rg.
image

After the transfer occurred, although the number of loaded segments experienced a decrease, it did not fully drop to zero; instead, some segments remained and did not decrease any further.
image

@zhuwenxing
Copy link
Contributor Author

test job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/resource_group_test/detail/resource_group_test/8/pipeline

During this test, the information obtained through the get query segment info interface indicates that after the transfer, the segment is only present in the target's rg. However, in the Grafana monitoring, it shows that the segment still exists in the source rg. image

After the transfer occurred, although the number of loaded segments experienced a decrease, it did not fully drop to zero; instead, some segments remained and did not decrease any further. image

@weiliu1031
please take a look

@zhuwenxing
Copy link
Contributor Author

/unassign

@yanliang567 yanliang567 modified the milestones: 2.4.2, 2.4.3, 2.4.4 May 24, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.4, 2.4.5 Jun 5, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.5, 2.4.6 Jun 26, 2024
@yanliang567 yanliang567 removed this from the 2.4.6 milestone Jul 19, 2024
@yanliang567 yanliang567 added this to the 2.4.7 milestone Jul 19, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.7, 2.4.8 Aug 12, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.8, 2.4.10 Aug 19, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.10, 2.4.11 Sep 5, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.11, 2.4.12 Sep 18, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.12, 2.4.13 Sep 27, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.13, 2.4.14 Oct 15, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.14, 2.4.16 Nov 14, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.16, 2.4.17, 2.4.18 Nov 21, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.18, 2.4.19 Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants