Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

Open
nitindhiman314e opened this issue Jan 14, 2025 · 2 comments
Open

Packet Loss and TCP Stream Timeout Issues in Kubeshark Worker #1686

nitindhiman314e opened this issue Jan 14, 2025 · 2 comments

Comments

@nitindhiman314e
Copy link

We are experiencing the following issues with our Kubeshark worker deployment:

  1. High Packet Loss:
  • Traffic monitoring shows significant packet loss, impacting data accuracy.
  1. TCP Stream Timeout Problems:
  • TCP streams are truncated even after increasing the TCP_STREAM_CHANNEL_TIMEOUT_MS to 10000.
  • Limited clarity on the purpose of TCP_STREAM_CHANNEL_TIMEOUT_SHOW.
  1. Worker Restart: Found failed daemon pod kubeshark/kubeshark-worker-daemon-set-5q89h on node, will try to kill it

Actions Taken

  1. Increased TCP_STREAM_CHANNEL_TIMEOUT_MS to 10 seconds.
  2. Enabled TCP_STREAM_CHANNEL_TIMEOUT_SHOW for insights (behavior unclear).
  3. Optimized resource configurations for worker pods.
  4. Updated health probes for better container monitoring.
  5. Enabled "-enable-resource-guard".

I've also attached "kubeshark-worker-daemon" YAML and "Received Packet Vs Dropped Packet graph" for more insight. Any pointers, advice, or resources would be super helpful!
kubeshark
kubeshark-daemon.yaml.zip

Thanks.

@alongir
Copy link
Member

alongir commented Jan 14, 2025

Hi @nitindhiman314e
Usually you shouldn't see any packet loss as long as the Worker enjoys sufficient resources. In these situations you can either increase resources or use backend filters to reduce the amount of traffic that will be processed.

@nitindhiman314e
Copy link
Author

nitindhiman314e commented Jan 15, 2025

Hi @nitindhiman314e Usually you shouldn't see any packet loss as long as the Worker enjoys sufficient resources. In these situations you can either increase resources or use backend filters to reduce the amount of traffic that will be processed.

Hi @alongir,

Thank you for your reply. I currently have 4 nodes, and the corresponding 4 Kubeshark workers are running. Out of these, 2 workers have been running for the last 35 hours without any dropped packets. However, the other 2 are restarting intermittently and showing a high packet loss count. Below are the daemon worker resource limits we are using:

resources:
  limits:
    memory: 5Gi
  requests:
    cpu: '1'
    memory: 50Mi

I have also attached the current usage of the Kubeshark workers. Could you please suggest any tuning recommendations?
Regarding the worker restarts, we encountered the following error:

Back-off restarting failed container sniffer in pod kubeshark-worker-daemon-set-8tk5l_kubeshark (ee32d21b-85be-49e6-8120-7fab7e986ad2)

kubeshark

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants