Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

nitindhiman314e · 2025-01-14T04:18:14Z

We are experiencing the following issues with our Kubeshark worker deployment:

High Packet Loss:

Traffic monitoring shows significant packet loss, impacting data accuracy.

TCP Stream Timeout Problems:

TCP streams are truncated even after increasing the TCP_STREAM_CHANNEL_TIMEOUT_MS to 10000.
Limited clarity on the purpose of TCP_STREAM_CHANNEL_TIMEOUT_SHOW.

Worker Restart: Found failed daemon pod kubeshark/kubeshark-worker-daemon-set-5q89h on node, will try to kill it

Actions Taken

Increased TCP_STREAM_CHANNEL_TIMEOUT_MS to 10 seconds.
Enabled TCP_STREAM_CHANNEL_TIMEOUT_SHOW for insights (behavior unclear).
Optimized resource configurations for worker pods.
Updated health probes for better container monitoring.
Enabled "-enable-resource-guard".

I've also attached "kubeshark-worker-daemon" YAML and "Received Packet Vs Dropped Packet graph" for more insight. Any pointers, advice, or resources would be super helpful!

kubeshark-daemon.yaml.zip

Thanks.

alongir · 2025-01-14T20:14:42Z

Hi @nitindhiman314e
Usually you shouldn't see any packet loss as long as the Worker enjoys sufficient resources. In these situations you can either increase resources or use backend filters to reduce the amount of traffic that will be processed.

nitindhiman314e · 2025-01-15T06:28:17Z

Hi @nitindhiman314e Usually you shouldn't see any packet loss as long as the Worker enjoys sufficient resources. In these situations you can either increase resources or use backend filters to reduce the amount of traffic that will be processed.

Hi @alongir,

Thank you for your reply. I currently have 4 nodes, and the corresponding 4 Kubeshark workers are running. Out of these, 2 workers have been running for the last 35 hours without any dropped packets. However, the other 2 are restarting intermittently and showing a high packet loss count. Below are the daemon worker resource limits we are using:

resources:
  limits:
    memory: 5Gi
  requests:
    cpu: '1'
    memory: 50Mi

I have also attached the current usage of the Kubeshark workers. Could you please suggest any tuning recommendations?
Regarding the worker restarts, we encountered the following error:

Back-off restarting failed container sniffer in pod kubeshark-worker-daemon-set-8tk5l_kubeshark (ee32d21b-85be-49e6-8120-7fab7e986ad2)

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

nitindhiman314e commented Jan 14, 2025

alongir commented Jan 14, 2025

nitindhiman314e commented Jan 15, 2025 •

edited

Loading

Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

Comments

nitindhiman314e commented Jan 14, 2025

alongir commented Jan 14, 2025

nitindhiman314e commented Jan 15, 2025 • edited Loading

nitindhiman314e commented Jan 15, 2025 •

edited

Loading