[Bug]: Querynode experienced multiple restarts during testing, pod restart reason was Error #38546

zhuwenxing · 2024-12-18T04:03:28Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version:master-20241217-e19a4f76-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-12-17T10:30:15.064Z] + kubectl get pods -o wide

[2024-12-17T10:30:15.067Z] + grep kafka-cluster-reinstall-2977

[2024-12-17T10:30:15.629Z] kafka-cluster-reinstall-2977-0                                    2/2     Running                  0                  21m     10.104.25.7     4am-node30   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-1                                    2/2     Running                  0                  21m     10.104.16.121   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-2                                    2/2     Running                  0                  21m     10.104.23.77    4am-node27   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-etcd-0                               1/1     Running                  0                  21m     10.104.16.120   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-etcd-1                               1/1     Running                  0                  21m     10.104.24.192   4am-node29   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-etcd-2                               1/1     Running                  0                  21m     10.104.20.187   4am-node22   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-exporter-6cdffc5f44-xz5hs            1/1     Running                  3 (20m ago)        21m     10.104.17.171   4am-node23   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-datanode-6b7b778c46-b9qrn     1/1     Running                  0                  21m     10.104.25.5     4am-node30   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-datanode-6b7b778c46-tpd22     1/1     Running                  1 (20m ago)        21m     10.104.13.111   4am-node16   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-indexnode-7f94bd74c9-8nlfm    1/1     Running                  1 (20m ago)        21m     10.104.23.75    4am-node27   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-indexnode-7f94bd74c9-b47wx    1/1     Running                  0                  21m     10.104.26.77    4am-node32   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-indexnode-7f94bd74c9-jbgsb    1/1     Running                  0                  21m     10.104.33.104   4am-node36   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-mixcoord-6d8f9887cf-gw59k     1/1     Running                  1 (20m ago)        21m     10.104.15.50    4am-node20   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-proxy-59998f974f-4jm62        1/1     Running                  1 (20m ago)        21m     10.104.16.115   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-querynode-ff54cd4c-jfm8c      1/1     Running                  4 (7m49s ago)      21m     10.104.15.51    4am-node20   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-querynode-ff54cd4c-kd75h      1/1     Running                  3 (6m58s ago)      21m     10.104.6.123    4am-node13   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-querynode-ff54cd4c-n2bgf      1/1     Running                  4 (6m39s ago)      21m     10.104.16.114   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-0                              1/1     Running                  0                  21m     10.104.16.119   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-1                              1/1     Running                  0                  21m     10.104.20.186   4am-node22   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-2                              1/1     Running                  0                  21m     10.104.27.127   4am-node31   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-3                              1/1     Running                  0                  21m     10.104.17.176   4am-node23   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-zookeeper-0                          1/1     Running                  0                  21m     10.104.20.185   4am-node22   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-zookeeper-1                          1/1     Running                  0                  21m     10.104.17.174   4am-node23   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-zookeeper-2                          1/1     Running                  0                  21m     10.104.33.108   4am-node36   <none>           <none>

[2024-12-17T10:28:24.790Z] FAILED testcases/test_action_first_deployment.py::TestActionFirstDeployment::test_task_all[HNSW-all-is_string_indexed-is_deleted-is_compacted-2] - AssertionError: Response of API load expect True, but got False

[2024-12-17T10:28:24.790Z] FAILED testcases/test_action_first_deployment.py::TestActionFirstDeployment::test_task_all[IVF_FLAT-all-not_string_indexed-is_deleted-not_compacted-2] - AssertionError: Response of API load expect True, but got False

[2024-12-17T10:28:24.790Z] FAILED testcases/test_action_first_deployment.py::TestActionFirstDeployment::test_task_all[IVF_SQ8-all-not_string_indexed-is_deleted-is_compacted-2] - AssertionError: Response of API load expect True, but got False

All cases requiring multiple replicas to be loaded failed with the error: resource group node not enough[rg=__default_resource_group][currentNodeNum=1][expectedNodeNum=2]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_cron/detail/deploy_test_kafka_cron/2977/pipeline

log:
artifacts-kafka-cluster-reinstall-2977-server-second-deployment-logs.tar.gz

4am
chaos-testing

pod info

[2024-12-17T10:30:15.064Z] + kubectl get pods -o wide

[2024-12-17T10:30:15.067Z] + grep kafka-cluster-reinstall-2977

[2024-12-17T10:30:15.629Z] kafka-cluster-reinstall-2977-0                                    2/2     Running                  0                  21m     10.104.25.7     4am-node30   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-1                                    2/2     Running                  0                  21m     10.104.16.121   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-2                                    2/2     Running                  0                  21m     10.104.23.77    4am-node27   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-etcd-0                               1/1     Running                  0                  21m     10.104.16.120   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-etcd-1                               1/1     Running                  0                  21m     10.104.24.192   4am-node29   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-etcd-2                               1/1     Running                  0                  21m     10.104.20.187   4am-node22   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-exporter-6cdffc5f44-xz5hs            1/1     Running                  3 (20m ago)        21m     10.104.17.171   4am-node23   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-datanode-6b7b778c46-b9qrn     1/1     Running                  0                  21m     10.104.25.5     4am-node30   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-datanode-6b7b778c46-tpd22     1/1     Running                  1 (20m ago)        21m     10.104.13.111   4am-node16   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-indexnode-7f94bd74c9-8nlfm    1/1     Running                  1 (20m ago)        21m     10.104.23.75    4am-node27   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-indexnode-7f94bd74c9-b47wx    1/1     Running                  0                  21m     10.104.26.77    4am-node32   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-indexnode-7f94bd74c9-jbgsb    1/1     Running                  0                  21m     10.104.33.104   4am-node36   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-mixcoord-6d8f9887cf-gw59k     1/1     Running                  1 (20m ago)        21m     10.104.15.50    4am-node20   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-proxy-59998f974f-4jm62        1/1     Running                  1 (20m ago)        21m     10.104.16.115   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-querynode-ff54cd4c-jfm8c      1/1     Running                  4 (7m49s ago)      21m     10.104.15.51    4am-node20   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-querynode-ff54cd4c-kd75h      1/1     Running                  3 (6m58s ago)      21m     10.104.6.123    4am-node13   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-milvus-querynode-ff54cd4c-n2bgf      1/1     Running                  4 (6m39s ago)      21m     10.104.16.114   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-0                              1/1     Running                  0                  21m     10.104.16.119   4am-node21   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-1                              1/1     Running                  0                  21m     10.104.20.186   4am-node22   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-2                              1/1     Running                  0                  21m     10.104.27.127   4am-node31   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-minio-3                              1/1     Running                  0                  21m     10.104.17.176   4am-node23   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-zookeeper-0                          1/1     Running                  0                  21m     10.104.20.185   4am-node22   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-zookeeper-1                          1/1     Running                  0                  21m     10.104.17.174   4am-node23   <none>           <none>

[2024-12-17T10:30:15.630Z] kafka-cluster-reinstall-2977-zookeeper-2                          1/1     Running                  0                  21m     10.104.33.108   4am-node36   <none>           <none>

The text was updated successfully, but these errors were encountered:

yanliang567 · 2024-12-18T06:05:36Z

/assign @weiliu1031
/unassign

zhuwenxing · 2024-12-18T06:05:56Z

it seems like a stable reproduced issue
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/3057/pipeline
log:
artifacts-pulsar-cluster-reinstall-3057-server-logs.tar.gz

cluster: 4am
ns: chaos-testing
pod info

[2024-12-18T05:55:55.009Z] + grep pulsar-cluster-reinstall-3057

[2024-12-18T05:55:55.015Z] + kubectl get pods -o wide

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-bookie-0                            1/1     Running            0                31m     10.104.19.56    4am-node28   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-bookie-1                            1/1     Running            0                31m     10.104.27.162   4am-node31   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-bookie-init-fjmmt                   0/1     Completed          0                31m     10.104.19.51    4am-node28   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-broker-0                            1/1     Running            0                31m     10.104.6.241    4am-node13   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-etcd-0                              1/1     Running            0                31m     10.104.19.58    4am-node28   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-etcd-1                              1/1     Running            0                31m     10.104.27.159   4am-node31   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-etcd-2                              1/1     Running            0                31m     10.104.18.98    4am-node25   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-milvus-datanode-57fdcd598c-rkdbn    1/1     Running            2 (31m ago)      31m     10.104.25.230   4am-node30   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-milvus-datanode-57fdcd598c-vs5xv    1/1     Running            2 (31m ago)      31m     10.104.34.207   4am-node37   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-milvus-indexnode-bccb7555d-dzswl    1/1     Running            2 (31m ago)      31m     10.104.21.193   4am-node24   <none>           <none>

[2024-12-18T05:55:55.015Z] pulsar-cluster-reinstall-3057-milvus-indexnode-bccb7555d-p4lhx    1/1     Running            2 (31m ago)      31m     10.104.34.209   4am-node37   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-milvus-indexnode-bccb7555d-r8xrw    1/1     Running            2 (31m ago)      31m     10.104.25.232   4am-node30   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-milvus-mixcoord-bff6db574-49466     1/1     Running            2 (31m ago)      31m     10.104.25.233   4am-node30   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-milvus-proxy-5db467768b-p82hm       1/1     Running            2 (31m ago)      31m     10.104.25.231   4am-node30   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-milvus-querynode-6c6cd5f86b-psg6r   1/1     Running            4 (21m ago)      31m     10.104.34.211   4am-node37   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-milvus-querynode-6c6cd5f86b-s4vcw   1/1     Running            3 (21m ago)      31m     10.104.19.49    4am-node28   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-milvus-querynode-6c6cd5f86b-znjgk   1/1     Running            3 (19m ago)      31m     10.104.21.192   4am-node24   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-minio-0                             1/1     Running            0                31m     10.104.19.57    4am-node28   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-minio-1                             1/1     Running            0                31m     10.104.27.158   4am-node31   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-minio-2                             1/1     Running            0                31m     10.104.34.215   4am-node37   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-minio-3                             1/1     Running            0                31m     10.104.18.95    4am-node25   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-proxy-0                             1/1     Running            0                31m     10.104.6.240    4am-node13   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-pulsar-init-6bpfl                   0/1     Completed          0                31m     10.104.19.50    4am-node28   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-recovery-0                          1/1     Running            0                31m     10.104.6.239    4am-node13   <none>           <none>

[2024-12-18T05:55:55.016Z] pulsar-cluster-reinstall-3057-zookeeper-0                         1/1     Running            0                31m     10.104.27.157   4am-node31   <none>           <none>

[2024/12/18 05:36:01.951 +00:00] [INFO] [segments/segment_loader.go:482] ["request resource for loading segments (unit in MiB)"] [traceID=35dcda9a5dce51b1f4657e18cc659852] [segmentIDs="[454688636664268384]"] [memory=39.41035556793213] [committedMemory=334.4112501144409] [disk=0] [committedDisk=0]
I20241218 05:36:01.951419  3463 index_factory.cc:55] [KNOWHERE][Create][CGO_LOAD] use key IVF_FLAT_CC_fp32 to create knowhere index IVF_FLAT_CC with version 6
[2024/12/18 05:36:01.951 +00:00] [INFO] [segments/segment.go:337] ["create segment done"] [traceID=35dcda9a5dce51b1f4657e18cc659852]
[2024/12/18 05:36:01.951 +00:00] [INFO] [segments/segment_loader.go:355] ["start to load segments in parallel"] [traceID=35dcda9a5dce51b1f4657e18cc659852] [collectionID=454688636660717577] [segmentType=Sealed] [requestSegments="[454688636664268384]"] [preparedSegments="[454688636664268384]"] [segmentNum=1] [concurrencyLevel=1]
[2024/12/18 05:36:01.951 +00:00] [INFO] [segments/segment_loader.go:328] ["load segment..."] [traceID=35dcda9a5dce51b1f4657e18cc659852] [collectionID=454688636660717577] [segmentType=Sealed] [requestSegments="[454688636664268384]"] [preparedSegments="[454688636664268384]"] [partitionID=454688636660717578] [segmentID=454688636664268384] [segmentType=L1]
[2024/12/18 05:36:01.951 +00:00] [INFO] [segments/segment_loader.go:842] ["start loading segment files"] [traceID=35dcda9a5dce51b1f4657e18cc659852] [collectionID=454688636660717577] [partitionID=454688636660717578] [shard=by-dev-rootcoord-dml_2_454688636660717577v1] [segmentID=454688636664268384] [rowNum=6018] [segmentType=Sealed]
I20241218 05:36:01.957944  3602 ChunkedSegmentSealedImpl.cpp:286] [SERVER][LoadFieldData][CGO_LOAD][]segment 454688636663544527 loads field 102 mmap false done
[2024/12/18 05:36:01.958 +00:00] [INFO] [segments/segment.go:801] ["submitted loadFieldData task to load pool"] [traceID=ff310f8671db269c260bb9bae39613a7] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663544527] [fieldID=102] [rowCount=19480]
[2024/12/18 05:36:01.958 +00:00] [INFO] [segments/segment.go:809] ["load field done"] [traceID=ff310f8671db269c260bb9bae39613a7] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663544527] [fieldID=102] [rowCount=19480]
I20241218 05:36:01.958204  3602 ChunkedSegmentSealedImpl.cpp:261] [SERVER][LoadFieldData][CGO_LOAD][]segment 454688636663544527 loads field 113 with num_rows 19480
I20241218 05:36:01.958240  3602 ChunkedSegmentSealedImpl.cpp:275] [SERVER][LoadFieldData][CGO_LOAD][]segment 454688636663544527 submits load field 113 task to thread pool
I20241218 05:36:01.959270  3607 ChunkedSegmentSealedImpl.cpp:286] [SERVER][LoadFieldData][CGO_LOAD][]segment 454688636663544527 loads field 104 mmap false done
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:801] ["submitted loadFieldData task to load pool"] [traceID=ff310f8671db269c260bb9bae39613a7] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663544527] [fieldID=104] [rowCount=19480]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:809] ["load field done"] [traceID=ff310f8671db269c260bb9bae39613a7] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663544527] [fieldID=104] [rowCount=19480]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:846] ["add field data info done"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] ["row count"=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment_loader.go:761] ["Start loading fields..."] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [segmentID=454688636663520511] [indexedFields="[]"] ["indexed text fields"="[103]"] ["unindexed text fields"="[]"]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=110] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=106] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=111] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=102] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=107] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=100] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=103] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=104] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=108] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=105] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=113] [rowCount=19520]
[2024/12/18 05:36:01.959 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=112] [rowCount=19520]
[2024/12/18 05:36:01.960 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=109] [rowCount=19520]
[2024/12/18 05:36:01.960 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=1] [rowCount=19520]
[2024/12/18 05:36:01.960 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=0] [rowCount=19520]
[2024/12/18 05:36:01.960 +00:00] [INFO] [segments/segment.go:773] ["start loading field data for field"] [traceID=8d5dfb2872f7c7270f7a81f91bce1d77] [collectionID=454688636659781468] [partitionID=454688636659781469] [segmentID=454688636663520511] [fieldID=101] [rowCount=19520]
_ZN5folly21CPUThreadPoolExecutor9threadRunESt10shared_ptrINS_18ThreadPoolExecutor6ThreadEE
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/executors/CPUThreadPoolExecutor.cpp:333 pc=0x7f3273016e74
_ZSt13__invoke_implIvRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEERPS1_JRS4_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2_
	/usr/include/c++/12/bits/invoke.h:74 pc=0x7f327305644b
_ZSt8__invokeIRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEJRPS1_RS4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSC_DpOSD_
	/usr/include/c++/12/bits/invoke.h:96 pc=0x7f327305644b
_ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EE6__callIvJEJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
	/usr/include/c++/12/functional:495 pc=0x7f327305644b
_ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EEclIJEvEET0_DpOT_
	/usr/include/c++/12/functional:580 pc=0x7f327305644b
_ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallISt5_BindIFMNS_18ThreadPoolExecutorEFvSt10shared_ptrINS7_6ThreadEEEPS7_SA_EEEEvRNS1_4DataE
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/Function.h:345 pc=0x7f327305644b
(null)
	(null):0 pc=0x7f3274dfd252
(null)
	(null):0 pc=0x7f3274fe1ac2
(null)
	(null):0 pc=0x7f3275072a03
(null)
	(null):0 pc=0xffffffffffffffff

congqixia · 2024-12-18T06:17:04Z

SIGSEGV when doing retrieve

chyezh · 2024-12-18T06:24:48Z

BACKTRACE when crash is not stable.

no symbols:

2024-12-18 13:33:21.887	SIGNAL CATCH BY NON-GO SIGNAL HANDLER
2024-12-18 13:33:21.887	SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x7f2fb48601e0
2024-12-18 13:33:21.887	BACKTRACE:
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f325a66084c
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f3255616b94
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f3255616e10
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f3255617460
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f32555f7d66
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f32555f573f
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f325563ac3e
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f325563af06
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f3255e8a252
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f325606eac2
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0x7f325610084f
2024-12-18 13:33:25.952	(null)
2024-12-18 13:33:25.952		(null):0 pc=0xffffffffffffffff
2024-12-18 13:33:25.952

2024-12-18 13:34:21.509	BACKTRACE:
2024-12-18 13:34:27.359	(null)
2024-12-18 13:34:27.359		(null):0 pc=0x7f5df23503e8
2024-12-18 13:34:27.379	_ZN4core10intrinsics19copy_nonoverlapping17hc128077a5de2f201E
2024-12-18 13:34:27.379		library/core/src/intrinsics.rs:2685 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN4core3ptr9const_ptr33_$LT$impl$u20$$BP$const$u20$T$GT$22copy_to_nonoverlapping17h6b553735fb8d9e78E
2024-12-18 13:34:27.379		library/core/src/ptr/const_ptr.rs:1291 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h2432126ffe61ef0fE
2024-12-18 13:34:27.379		library/alloc/src/slice.rs:167 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN5alloc5slice4hack6to_vec17hd9d04bf9154f9616E
2024-12-18 13:34:27.379		library/alloc/src/slice.rs:111 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17h940bcad93ff3ed33E
2024-12-18 13:34:27.379		library/alloc/src/slice.rs:441 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h45cd31c368b8963dE
2024-12-18 13:34:27.379		library/alloc/src/slice.rs:416 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN3std3sys4unix6os_str5Slice8to_owned17h6415b0e03348ba51E
2024-12-18 13:34:27.379		library/std/src/sys/unix/os_str.rs:229 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN3std3ffi6os_str5OsStr12to_os_string17hb2557d74928bb6a7E
2024-12-18 13:34:27.379		library/std/src/ffi/os_str.rs:885 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN3std4path4Path11to_path_buf17h9d418bf96d08b47cE
2024-12-18 13:34:27.379		library/std/src/path.rs:2155 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	_ZN3std4path4Path5_join17hb1db9fde869b0c4aE
2024-12-18 13:34:27.379		library/std/src/path.rs:2555 pc=0x7f5df54f0cea
2024-12-18 13:34:27.379	(null)
2024-12-18 13:34:27.379		(null):0 pc=0x7f5df5091d47
2024-12-18 13:34:27.379	(null)
2024-12-18 13:34:27.379		(null):0 pc=0x7f5df50da2d1
2024-12-18 13:34:27.379	(null)
2024-12-18 13:34:27.379		(null):0 pc=0x7f5df51c5419
2024-12-18 13:34:27.379	(null)
2024-12-18 13:34:27.379		(null):0 pc=0x7f5df51c80f3
2024-12-18 13:34:27.379	(null)
2024-12-18 13:34:27.379		(null):0 pc=0x7f5df55842bf
2024-12-18 13:34:27.428	_ZN6milvus5index20InvertedIndexTantivyIlE5CountEv
2024-12-18 13:34:27.428		/workspace/source/internal/core/src/index/InvertedIndexTantivy.h:78 pc=0x7f5df48e9942
2024-12-18 13:34:27.428	_ZN6milvus5index20InvertedIndexTantivyIlE5RangeElNS_5proto4plan6OpTypeE
2024-12-18 13:34:27.428		/workspace/source/internal/core/src/index/InvertedIndexTantivy.cpp:291 pc=0x7f5df48e9942
2024-12-18 13:34:27.549	_ZN6milvus4exec14UnaryIndexFuncIlLNS_5proto4plan6OpTypeE1EEclEPNS_5index11ScalarIndexIlEEl
2024-12-18 13:34:27.549		/workspace/source/internal/core/src/exec/expression/UnaryExpr.h:286 pc=0x7f5df4eb8576
2024-12-18 13:34:27.549	_ZZN6milvus4exec23PhyUnaryRangeFilterExpr28ExecRangeVisitorImplForIndexIlEESt10shared_ptrINS_10BaseVectorEEvENKUlPNS_5index11ScalarIndexIlEElE_clES9_l
2024-12-18 13:34:27.549		/workspace/source/internal/core/src/exec/expression/UnaryExpr.cpp:836 pc=0x7f5df4eb8576
2024-12-18 13:34:27.549	_ZN6milvus4exec11SegmentExpr18ProcessIndexChunksIlZNS0_23PhyUnaryRangeFilterExpr28ExecRangeVisitorImplForIndexIlEESt10shared_ptrINS_10BaseVectorEEvEUlPNS_5index11ScalarIndexIlEElE_JlEEES7_T0_DpT1_

wangting0128 · 2024-12-18T06:28:01Z

chyezh · 2024-12-18T08:15:23Z

@zhuwenxing please use asan image harbor.milvus.io/milvus/milvus:chyezh-temp_for_asan_image-20241218-9c8c1b3b-amd64 to rerun it.

chyezh · 2024-12-19T04:06:37Z

ASAN reproduced.

2024-12-18 21:52:25.688	==7==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x610000930238 at pc 0x7f8c667f67c1 bp 0x7f89db9aeff0 sp 0x7f89db9aefe0
2024-12-18 21:52:25.688	READ of size 8 at 0x610000930238 thread T1240
2024-12-18 21:52:26.999	    #0 0x7f8c667f67c0 in milvus::bitset::detail::Proxy<milvus::bitset::detail::VectorizedElementWiseBitsetPolicy<unsigned long, milvus::bitset::detail::VectorizedDynamic> >::set() /workspace/source/internal/core/src/bitset/detail/proxy.h:114
2024-12-18 21:52:26.999	    #1 0x7f8c667f67c0 in milvus::bitset::detail::Proxy<milvus::bitset::detail::VectorizedElementWiseBitsetPolicy<unsigned long, milvus::bitset::detail::VectorizedDynamic> >::operator=(bool) /workspace/source/internal/core/src/bitset/detail/proxy.h:70
2024-12-18 21:52:26.999	    #3 0x7f8c667f67c0 in milvus::index::TextMatchIndex::MatchQuery(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /workspace/source/internal/core/src/index/TextMatchIndex.cpp:255
2024-12-18 21:52:26.999	    #4 0x7f8c680c662b in operator() /workspace/source/internal/core/src/exec/expression/UnaryExpr.cpp:1093
2024-12-18 21:52:26.999	    #5 0x7f8c680c662b in ProcessTextMatchIndex<milvus::exec::PhyUnaryRangeFilterExpr::ExecTextMatch()::<lambda(Index*, const std::string&)>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > /workspace/source/internal/core/src/exec/expression/Expr.h:966
2024-12-18 21:52:26.999	    #6 0x7f8c680c662b in milvus::exec::PhyUnaryRangeFilterExpr::ExecTextMatch() /workspace/source/internal/core/src/exec/expression/UnaryExpr.cpp:1095
2024-12-18 21:52:26.999	    #7 0x7f8c682b6bce in std::shared_ptr<milvus::BaseVector> milvus::exec::PhyUnaryRangeFilterExpr::ExecRangeVisitorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(folly::fbvector<int, std::allocator<int> >*) /workspace/source/internal/core/src/exec/expression/UnaryExpr.cpp:805
2024-12-18 21:52:26.999	    #8 0x7f8c680f657c in milvus::exec::PhyUnaryRangeFilterExpr::Eval(milvus::exec::EvalCtx&, std::shared_ptr<milvus::BaseVector>&) /workspace/source/internal/core/src/exec/expression/UnaryExpr.cpp:186
2024-12-18 21:52:26.999	    #9 0x7f8c67bd3807 in milvus::exec::ExprSet::Eval(int, int, bool, milvus::exec::EvalCtx&, std::vector<std::shared_ptr<milvus::BaseVector>, std::allocator<std::shared_ptr<milvus::BaseVector> > >&) /workspace/source/internal/core/src/exec/expression/Expr.cpp:49
2024-12-18 21:52:26.999	    #10 0x7f8c6831f6de in milvus::exec::PhyFilterBitsNode::GetOutput() /workspace/source/internal/core/src/exec/operator/FilterBitsNode.cpp:72
2024-12-18 21:52:26.999	    #11 0x7f8c67247dc2 in milvus::exec::Driver::RunInternal(std::shared_ptr<milvus::exec::Driver>&, std::shared_ptr<milvus::exec::BlockingState>&, std::shared_ptr<milvus::RowVector>&) /workspace/source/internal/core/src/exec/Driver.cpp:239
2024-12-18 21:52:26.999	    #12 0x7f8c6724b9a2 in milvus::exec::Driver::Next(std::shared_ptr<milvus::exec::BlockingState>&)

The count of tantivy reader may see the older count than the hits of match_query.

    auto cnt = wrapper_->count();
    TargetBitmap bitset(cnt);
    if (bitset.empty()) {
        return bitset;
    }
    auto hits = wrapper_->match_query(query);
    apply_hits(bitset, hits, true);

issue: #38546, #38486 Signed-off-by: chyezh <chyezh@outlook.com>

zhuwenxing · 2024-12-19T12:38:15Z

verified and fixed in master-8fcb33c-20241219

remove critical label

zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 18, 2024

zhuwenxing assigned yanliang567 Dec 18, 2024

zhuwenxing added severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. incompatible this issue/pr maybe incompatible with previous version labels Dec 18, 2024

zhuwenxing added this to the 2.5.0 milestone Dec 18, 2024

sre-ci-robot assigned weiliu1031 and unassigned yanliang567 Dec 18, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 18, 2024

zhuwenxing removed the incompatible this issue/pr maybe incompatible with previous version label Dec 18, 2024

chyezh self-assigned this Dec 18, 2024

chyezh mentioned this issue Dec 19, 2024

fix: interted index out of range #38577

Merged

chyezh mentioned this issue Dec 19, 2024

[Bug]: After milvus recovers from mixcoord pod kill chaos, querynode is in a unhealthy status and collection load times out（180s） #38486

Open

1 task

sre-ci-robot pushed a commit that referenced this issue Dec 19, 2024

fix: interted index out of range (#38577)

b537a72

issue: #38546, #38486 Signed-off-by: chyezh <chyezh@outlook.com>

zhuwenxing removed the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Dec 19, 2024

yanliang567 modified the milestones: 2.5.0, 2.5.1 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Querynode experienced multiple restarts during testing, pod restart reason was Error #38546

[Bug]: Querynode experienced multiple restarts during testing, pod restart reason was Error #38546

zhuwenxing commented Dec 18, 2024

yanliang567 commented Dec 18, 2024

zhuwenxing commented Dec 18, 2024

congqixia commented Dec 18, 2024

chyezh commented Dec 18, 2024

wangting0128 commented Dec 18, 2024

chyezh commented Dec 18, 2024 •

edited

Loading

chyezh commented Dec 19, 2024

zhuwenxing commented Dec 19, 2024

[Bug]: Querynode experienced multiple restarts during testing, pod restart reason was Error #38546

[Bug]: Querynode experienced multiple restarts during testing, pod restart reason was Error #38546

Comments

zhuwenxing commented Dec 18, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Dec 18, 2024

zhuwenxing commented Dec 18, 2024

congqixia commented Dec 18, 2024

chyezh commented Dec 18, 2024

wangting0128 commented Dec 18, 2024

chyezh commented Dec 18, 2024 • edited Loading

chyezh commented Dec 19, 2024

zhuwenxing commented Dec 19, 2024

chyezh commented Dec 18, 2024 •

edited

Loading