-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Very high p99 with low requests and sufficient resources #34476
Comments
quick questions:
/assign @congqixia @syang1997 |
|
Node partial monitoring indicators @yanliang567 |
/assign @congqixia |
@yanliang567 Who can help with the investigation? Recently it appeared again. |
not have any clues yet. We need more info in logs to know what was happening at that moment. Please offer the full milvus pod logs during the period. |
milvus-log (3).tar.gz
But I did not filter a similar query time in the log of QueryNode. It was initially suspected that it was a MQ problem, but the resource usage rate of monitoring and viewing Pulsar was very low, and the nodes at the time were not abnormal. |
@yanliang567 Can you help me analyze the cause of timeout? |
okay, let me check the logs |
Hello @syang1997 , could you please provide us the monitoring for wait tsafe latency? |
|
Additionally, please attach the metric screenshots around 2024/07/23 15:05(+1h、-1h), it would be helpful for us to address the issue. @syang1997 |
@bigsheeper The delay waiting for search is long, but the delay of QueryNode is not long. |
Is there any way to export the specified time period? It has been a long time to introduce the log script to choose the 24 -hour log is too large |
@syang1997 The wait tsafe latency monitoring is like this: |
I didn't find this panel, which version of Grafana Dashboard you used |
It's ok, if the querynode search request latency is low, then the wait tsafe latency is likely to be low as well. |
@bigsheeper @yanliang567 What more information do you need from me? |
proxy-57.log traceID=e8d19598e7a4b42dfaddd2ea28565acd |
do we have host machine monitoring metrics? |
@xiaofan-luan proxy node monitoring |
querynode-71
proxy-66
|
it seems that when querynode receive this request, it already takes 2 seconds |
Is there an existing issue for this?
Environment
Current Behavior
During the smooth request, p99 suddenly increased to 15kms,
but the resources were sufficient and the CPU and memory were low. What was the reason?
The following is the monitoring
The following is the qureynode log
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: