-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting unusual timeouts with LoadBalancer on Kubernetes 1.3.3 running on GCE #29759
Comments
Another interesting finding is that if I create a cluster with a master and one minion (previously 4 minion cluster) I do not get this issue. |
I have found so far that setting Any thoughts here? Could this be something to do with the iptables configuration or am I looking in the wrong spot? |
So I have not found a solution to this but I have proved that it has something to do with |
I don't think this is a kube-proxy issue, if you say that it works fine within the cluster. |
There are several interesting datapoints here.
IIUC, the GCE load-balancer configuration is untouched when the RC is scaled so that we can eliminate. Endpoint addition/removal will trigger conntrack flushes but the next SYN will work, 120 seconds is way too long. We definitely need more debug data, we would need the output of describe service for both services, all endpoints for the 2 services and the output of iptables-save from all minions when the problem is 'live'. |
Very sorry for the late reply. here is the data you asked for. Service Describes
Endpoints
Iptables Save BEFORE SCALE
Iptables Save AFTER SCALE
|
Hi,
I am getting unusual timeouts with the 'LoadBalancer' service type on Kubernetes 1.3.3 running on GCE, and I don't know where to start troubleshooting.
My environment:
Cluster created via the
cluster/kube-up.sh
script.Here is some repro steps to try to illustrate what I'm seeing.
1. Create two simple nginx RCs and two LoadBalancer services.
2. Curl the first nginx's LoadBalancer IP 10times/sec, ts for timestamps
while true; do /usr/bin/curl -k -I http://104.155.xxx.xxx | ts ; sleep 0.1; done
3. Scale replicas for the second nginx RC.
kubectl scale rc nginx-alpine2 --replicas 4
4. Watch the curl command of the first nginx timeout for 2 minutes.
During the timeout, netstat says:
tcp 0 1 10.10.130.104:54848 104.155.xxx.xxx:80 SYN_SENT 17235/curl
Interestingly, a GCE Kubernetes cluster running 1.2.5 or GKE running 1.3.3 does not exhibit this timeout.
Is this normal or am I doing something wrong?
I should also point out that communications to the service inside the cluster via k8s dns work perfect.
The text was updated successfully, but these errors were encountered: