Skip to content

vmstorage pod restart causing cpu spike on sequentially higher vmstorage pod #8103

Open
@grdq

Description

Is your question request related to a specific component?

vmstorage, vminsert

Describe the question in detail

Hi,

We are running across an interesting issue in our VM cluster which I want to see if has been seen before, the issue is as follows:

We have ~30 vminsert and 54 vmstorage pods in a kubernetes cluster. Since we enabled a replicationFactor of 2 to make reads more resilient to vmstorage pod restarts we have notice a very odd behavior. When a vmstorage pod restarts, say vmstorage-10, as that pod is restarting always the vmstorage pod a number higher (vmstorage-11) will then see a massive cpu spike (we don't set a cpu pod limit so can spike from 3 cores to 30 cores) which will cause storage connection saturation for that pod spiking (vmstorage-11) to go over 1s, which then causes every vminsert pod to report a maxing out of their maxConcurrentInsert capacity which then can cause a drop in ingestion rate. The "spike" can last for ~5 mins.

We have replicated this many many times, and when a vmstorage pod restarts it is always the vmstorage pod a number higher that experiences the cpu / connection saturation spike. We even saw this when we restarted vmstorage-54, vmstorage-0 ran into the behavior.

Has this ever been reported or does this behavior sound familiar? When looking at the rerouting metrics it looks to evenly reroute rows from the actually restarting pod to every other vmstorage pod evenly, so it doesn't make sense to us that just the pod a number higher is having to do so much cpu processing.

Also, we have collected cpu profile and traces of before and after of the vmstorage pod experiencing the spike, but just getting approval to share that here, though I know that confidential information is not contained in the cpu profile / trace.

We run a cluster of v1.107.0 and use the official helm charts to deploy it on kubernetes.

Any help on this would be fantastic,

Thanks

Troubleshooting docs

Metadata

Assignees

Labels

enhancementNew feature or requestquestionThe question issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions