cluster up fails with "getsockopt: connection refused ()" #20617

agajdosi · 2018-08-13T12:40:10Z

I am facing problem with oc cluster up when use it in nested virtualization environments, for example: RHEL7 VM in which I run CentOS VM on which I deploy the cluster. Deployment sometimes goes well, however 90% of cases it fails with getsockopt: connection refused (). It is also reproducible with v3.9.0, however with that error looks a little bit different.

Version

v3.11.0
v3.10.0
v3.9.0

Steps To Reproduce

oc cluster up --public-hostname 192.168.42.18 --routing-suffix 192.168.42.18.nip.io --base-dir /var/lib/minishift/base

Current Result

v3.10.0:

-- Starting OpenShift cluster ..................................................................................Error during 'cluster up' execution: Error starting the cluster. ssh command error:
command : /var/lib/minishift/bin/oc cluster up --public-hostname 192.168.42.18 --routing-suffix 192.168.42.18.nip.io --base-dir /var/lib/minishift/base
err     : exit status 1
output  : Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Pulling image openshift/origin-control-plane:v3.10
Image pull complete
Pulling image openshift/origin-cli:v3.10
Pulled 1/4 layers, 32% complete
Pulled 2/4 layers, 51% complete
Pulled 3/4 layers, 94% complete
Pulled 4/4 layers, 100% complete
Extracting
Image pull complete
Pulling image openshift/origin-node:v3.10
Pulled 5/6 layers, 85% complete
Pulled 6/6 layers, 100% complete
Extracting
Image pull complete
Checking type of volume mount ...
Determining server IP ...
Using public hostname IP 192.168.42.18 as the host IP
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.10 ...
I0809 07:34:08.277022    2024 config.go:42] Running "create-master-config"
I0809 07:34:34.990393    2024 config.go:46] Running "create-node-config"
I0809 07:34:38.233104    2024 flags.go:30] Running "create-kubelet-flags"
I0809 07:34:40.487997    2024 run_kubelet.go:48] Running "start-kubelet"
I0809 07:34:41.401100    2024 run_self_hosted.go:172] Waiting for the kube-apiserver to be ready ...
E0809 07:39:42.208988    2024 run_self_hosted.go:542] API server error: Get https://192.168.42.18:8443/healthz?timeout=32s: dial tcp 192.168.42.18:8443: getsockopt: connection refused ()
Error: timed out waiting for the condition

v3.9:

[hudson@agajdosi-test1 ~]$ minishift start
-- Starting profile 'minishift'
[...]
   Version: v3.9.0
-- Pulling the Openshift Container Image ........................................ OK
-- Copying oc binary from the OpenShift container image to VM ... OK
-- Starting OpenShift cluster ...........................Error during 'cluster up' execution: Error starting the cluster. ssh command error:
command : /var/lib/minishift/bin/oc cluster up --use-existing-config --host-config-dir /var/lib/minishift/openshift.local.config --host-data-dir /var/lib/minishift/hostdata --host-volumes-dir /var/lib/minishift/openshift.local.volumes --host-pv-dir /var/lib/minishift/openshift.local.pv --public-hostname 192.168.42.206 --routing-suffix 192.168.42.206.nip.io
err     : exit status 1
output  : Using nsenter mounter for OpenShift volumes
Using public hostname IP 192.168.42.206 as the host IP
Using 192.168.42.206 as the server IP
Starting OpenShift using openshift/origin:v3.9.0 ...
-- Starting OpenShift container ... 
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
FAIL
   Error: timed out waiting for OpenShift container "origin" 
   WARNING: 192.168.42.206:8443 may be blocked by firewall rules
   Details:
     Last 10 lines of "origin" container log:
     E0807 13:04:13.932270    2468 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1:8443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager: net/http: TLS handshake timeout
     E0807 13:04:15.511476    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/cmd/kube-scheduler/app/server.go:594: Failed to list *v1.Pod: Get https://127.0.0.1:8443/api/v1/pods?fieldSelector=spec.schedulerName%3Ddefault-scheduler%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.713451    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.ReplicaSet: Get https://127.0.0.1:8443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.784421    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Node: Get https://127.0.0.1:8443/api/v1/nodes?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.787247    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.StatefulSet: Get https://127.0.0.1:8443/apis/apps/v1beta1/statefulsets?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.793474    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.PodDisruptionBudget: Get https://127.0.0.1:8443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.795902    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.PersistentVolume: Get https://127.0.0.1:8443/api/v1/persistentvolumes?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.798232    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Get https://127.0.0.1:8443/api/v1/services?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.802930    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.PersistentVolumeClaim: Get https://127.0.0.1:8443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: net/http: TLS handshake timeout
     E0807 13:04:15.805170    2468 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.ReplicationController: Get https://127.0.0.1:8443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: net/http: TLS handshake timeout


   Solution:
     Ensure that you can access 192.168.42.206:8443 from your machine

Expected Result

Cluster should be up and running.

Additional Information

Minishift issue: minishift/minishift#2675

[try to run $ oc adm diagnostics (or oadm diagnostics) command if possible]
[if you are reporting issue related to builds, provide build logs with BUILD_LOGLEVEL=5]
[consider attaching output of the $ oc get all -o json -n <namespace> command to the issue]
[visit https://docs.openshift.org/latest/welcome/index.html]

The text was updated successfully, but these errors were encountered:

AIKiller · 2018-08-16T01:12:28Z

I also faced the same problem, have you solved this problem? Thank.

agajdosi · 2018-08-16T09:02:14Z

@AIKiller unfortunately I didn't 😢. The only track I have is that this issue might be connected to the fact that machine is connected behind corporate proxy. I cannot however connect the affected machines outside of current network, so I can't verify. It sounds crazy, but that is the only attribute which all the affected machines share, some more info at: minishift/minishift#2675.

If your machine is behind the proxy and you can connect it directly and verify, that would help.

jwforres · 2018-08-21T20:48:52Z

alerting the team that owns oc cluster up, but nested virtualization and corporate proxies just sounds like a recipe for problems

@openshift/sig-master

jdbarfield · 2018-09-10T11:24:54Z

We see the same problem periodically with cluster up on a CentOS VM, although it succeeds 70% of the time or more.

We are running this in a virtualized lab environment using Ravello, so each instance is identical. Also, no proxies or firewalls. I had attributed it to our lab environment sometimes running slow, but I have no evidence of that.

If I can help with log files or anything else, let me know what you need.

Thanks!

pgfaller · 2018-09-12T06:29:46Z

After adding RAM and CPU to a CentOS 7 VM (now 5 GB RAM, 4 CPUs, running in VirtualBox on Ubuntu 18.04) that I am trying to get openshift working on, I get past the original issue (getsockopt: connection refused); but now get:

...
I0912 08:16:29.125851 10203 apply_list.go:68] Installing "sample-templates/mongodb"
I0912 08:16:29.128923 10203 apply_list.go:68] Installing "centos-imagestreams"
I0912 08:16:29.143108 10203 apply_template.go:83] Installing "openshift-web-console-operator"
I0912 08:16:36.170868 10203 interface.go:41] Finished installing "sample-templates/django quickstart" "sample-templates/rails quickstart" "sample-templates/sample pipeline" "sample-templates/mongodb" "sample-templates/mysql" "sample-templates/postgresql" "sample-templates/nodejs quickstart" "sample-templates/jenkins pipeline ephemeral" "sample-templates/mariadb" "sample-templates/cakephp quickstart" "sample-templates/dancer quickstart"
E0912 08:21:36.320825 10203 interface.go:34] Failed to install "openshift-web-console-operator": timed out waiting for the condition
I0912 08:21:36.320885 10203 interface.go:41] Finished installing "openshift-router" "persistent-volumes" "openshift-web-console-operator" "centos-imagestreams" "openshift-image-registry" "sample-templates"
Error: timed out waiting for the condition

Watching with 'top', there is a 'hyperkube' process that gets very busy, but not for long periods. Is this maybe performance related?
I also noticed that once the 'getsockopt: connection refused' happens, I had to do a 'rm -rf' on the openshift server directory, and start from fresh.

moodboom · 2018-09-18T12:38:30Z

I'm seeing this as well on a fresh CentOS install after an initial successful install, when the next day I found the server down and could not restart it.

Starting OpenShift using openshift/origin-control-plane:v3.10 ...
I0918 08:16:54.363135   88883 flags.go:30] Running "create-kubelet-flags"
I0918 08:16:55.258282   88883 run_kubelet.go:48] Running "start-kubelet"
I0918 08:16:55.518150   88883 run_self_hosted.go:172] Waiting for the kube-apiserver to be ready ...
E0918 08:21:55.540121   88883 run_self_hosted.go:542] API server error: Get https://192.168.240.95:8443/healthz?timeout=32s: dial tcp 192.168.240.95:8443: getsockopt: connection refused ()
Error: timed out waiting for the condition

Are we not supposed to be running okd on a VM? I was hoping to use one well-provisioned corporate VM for all my containers.

bill0425 · 2018-09-20T15:37:39Z

I'm seeing this problem when I use Minishift. My system is a stand alone CentOS box on my home network. The version of Minishift is 1.24.0 which I pulled down 2 days ago. It appears to be running 3.10.0 of Openshift. Is there a workaround for this issue?

engkhun · 2018-09-29T15:26:11Z

any solution for this issue? im getting the same error.

agajdosi · 2018-10-03T13:14:02Z

@khun83 Unfortunately I do not know about any solution yet. It might be caused by slowness of network or computer, which could lead to that cluster-up gives up after a while and throws a timeout error.

One thing which could help would be to have all the images loaded in caching proxy, so time on pulls is saved. Another option would be to get into codebase of cluster-up, increase the timeouts, build the oc, try with it and verify that the slowness theory is right. We could then ask for addition of --timeout flag to oc cluster up so anybody could increase the timeout if she hits the timeout problem.

I will try to try those steps above this/next week, but if you are more lucky with time than me, then you can try on your setup and inform us. ping @bill0425 as you might be interested in above stuff ^

juanvallejo · 2018-10-03T14:15:17Z

cc @deads2k

agajdosi · 2018-10-15T09:56:16Z

This issue is also reproducible with OKD v3.11.0. It affects Minishift users and also any QE efforts which depend on cluster up - for example, Minishift QE team and also DevStudio QE team - by making the tests quite unstable. ping @deads2k

nstielau · 2018-10-15T17:22:29Z

Looking at the thread here, it is hard to tell if a) some nodes are just slow to start and 5 minute timeout is too little (i.e. 6 min would do it), or b) if there is some race condition that actually prevents the cluster from loading (i.e. that a 30 minute timeout would not do it).

Timeouts are tricky to get right for all scenarios. @agajdosi I like the configurable timeout. It might be easy to set via ENV to check, rather than plumb through via oc, but maybe there isn't a good pattern for that. Perhaps even just bumping to 10 minutes hardcoded.

Looks like running with verbose logs would give a little more info as well, although if the last error after 5 minutes is still 'connection refused' that wouldn't add more info.

https://github.com/openshift/origin/blob/master/pkg/oc/clusterup/run_self_hosted.go#L231

wopalka · 2018-10-15T18:22:45Z

What would help you folks diagnose this problem? If you let people know what you need and any changes that need to occur, I'm sure someone on the thread would be willing to help. Just give folks directions so we can help you.

/Bill

arnaud-deprez · 2018-10-20T21:31:41Z

Hi,

For me, it seems that using minishift config set image-caching true solves this issue.

Edit:
Well it solves it partially. It seems to work better but sometimes it is still failing.

nstielau · 2018-10-22T22:18:33Z

@arnaud-deprez thanks. Should we look into setting that value to true by default? (I don't know the implications)

moanrose · 2018-10-25T09:06:09Z

I see this problem too when running minishift.

The only solution so far is to run

minishift stop
minishift delete -f

And delete the folders ~/.kube and ~/.minishift

But it is rather timeconsuming

I tried to enable image caching, but without luck. I'm using Hyper-V on windows 10

lovoni · 2018-10-26T01:10:34Z

In fact, it is the livenessProbe of the apiserver pod that is failing (times out after 32 secondes as shown by the message: Get https://192.168.42.18:8443/healthz?timeout=32s).
As a durty workaround, I did the following:

minishift ssh
sudo vi /var/lib/minishift/base/static-pod-manifests/apiserver.yaml
Updated the liveness probe as follows (while minishift is still starting ):

      livenessProbe:
        initialDelaySeconds: 90
        httpGet:
          scheme: HTTPS
          port: 8443
          path: healthz

Note that the update may get overriden next time minishift starts. Yet, the workaround allows for not being stuck.

nstielau · 2018-11-01T21:15:05Z

@lovoni Good find. Do you know where those template live in code?

austincunningham · 2018-11-02T12:42:47Z

Had this issue, It only occurred when I attempted to upgrade the version of Openshift on an existing profile e.g. had a profile with 3.10 and attempted to start with 3.11

minishift start --openshift-version v3.11.0

after that the profile was unusable

amitkrout · 2018-11-13T09:47:18Z

@lovoni Thanks for the workaround. I kept eye in the location /var/lib/minishift/base/static-pod-manifests/ and update the file apiserver.yaml immediately when it was available during minishift start, but it does not work for me

minishift version - v1.26.1+Win10+VirtualBox
Times of try - 4

agajdosi · 2018-11-14T09:26:49Z

@openshift/sig-master @mfojtik This issue started to affect more machines when we started to use OKD 3.11.0. And as there is no progress on this issue since August the only answer for all the users of Minishift or CDK who face this issue in no other than "yeah, throw that laptop away and try another one" which is terrible.

It would be really great if you could find somebody who could take a look on this as it starts to be really painful issue for us.

openshift-ci-robot · 2018-11-14T09:26:56Z

@agajdosi: Reiterating the mentions to trigger a notification:
@openshift/sig-master

In response to this:

@openshift/sig-master @mfojtik This issue started to affect more machines when we started to use OKD 3.11.0. And as there is no progress on this issue since August the only answer for all the users of Minishift or CDK who face this issue in no other than "yeah, throw that laptop away and try another one" which is terrible.

It would be really great if you could find somebody who could take a look on this as it starts to be really painful issue for us.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

odockal · 2018-11-14T09:51:06Z

I would like to confirm the same issue as described above.

Error: timed out waiting for the condition

My case: VM machines: windows7/10 + rhel7 - 8cpu 16 GB ram, CDK 3.7.0-alpha-1.1 (oc v3.11.16).

Please, take a look at this issue, thanks.

stianst · 2018-11-15T08:47:17Z

Was facing this issue with oc cluster up. What I did to resolve it was:

Download latest oc client including accompanying kubectl
rm -rf ~/.kube ~/.minishift

After that oc cluster up worked fine. It seems there are cases when oc cluster up (I've observed this with Minishift as well) does not start properly when you have ran a different version in the past.

nstielau · 2018-11-15T22:11:20Z

@odockal Can you verify you have the latest version of oc?

jdbarfield · 2018-11-15T23:02:11Z

I don't think this is necessarily related to the version. I led a lab of over 100 people all starting up an oc cluster around the same time using exactly the same version, and fewer than 10% had this issue.

None of us has ever been able to duplicate this consistently, so it is very difficult to say whether or not one solution or another fixed the problem. One thing that always fixed the problem was time. Downloading the latest oc client might have worked because it added time between attempts.

menza · 2018-11-16T11:27:17Z

I have that problem (win 10, virtualbox 5.2.20, minishift 1.27) as well - my problem is:
I1116 05:08:35.196919 2512 run_kubelet.go:49] Running "start-kubelet"
I1116 05:08:35.716885 2512 run_self_hosted.go:181] Waiting for the kube-apiserver to be ready ...
E1116 05:13:51.731403 2512 run_self_hosted.go:571] API server error: Get https://192.168.99.102:8443/healthz?timeout=32s: net/http: TLS handshake timeout ()

I had that problem with 1.26. Only solution so far was to go back to 1.23 with openshift version 3.9.0.
It would be nice if this could be fixed.

odockal · 2018-11-16T11:42:59Z

@nstielau I can tell what I am using:

$ ./oc version
oc v3.11.16
kubernetes v1.11.0+d4cacc0

How can I find most actual version? Build from source?

menza12 · 2018-11-16T14:57:31Z

I spend some time debugging - it seems the root problem is around here:

[+]etcd ok\n[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/ca-registration ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[-]poststarthook/authorization.openshift.io-bootstrapclusterroles failed: reason withheld
[+]poststarthook/authorization.openshift.io-ensureopenshift-infra ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok\n[+]poststarthook/openshift.io-AdmissionInit ok
[+]poststarthook/openshift.io-StartInformers ok
[+]poststarthook/oauth.openshift.io-StartOAuthClientsBootstrapping ok
healthz check failed") has prevented the request from succeeding

can you please help?

anjannath · 2018-11-19T10:17:23Z

I am also facing the same issue with OKD 3.11.
One thing that i noticed was, the for docker ps only the following containers are running:

CREATED             STATUS              PORTS               NAMES
d8ed8a902910        docker.io/openshift/origin-hyperkube@sha256:83b6930bc60db72fe822ded1cf188f54928a6777de2ec0896e8425fae077d958   "hyperkube kube-co..."   28 minutes ago      Up 28 minutes                           k8s_controllers_kube-controller-manager-localhost_kube-system_dfcadfa6552711112062fbf1121a691c_2
2e9ddcc980a3        docker.io/openshift/origin-hyperkube@sha256:83b6930bc60db72fe822ded1cf188f54928a6777de2ec0896e8425fae077d958   "hyperkube kube-sc..."   28 minutes ago      Up 28 minutes                           k8s_scheduler_kube-scheduler-localhost_kube-system_f903f642800a02b87385310221ffe91f_2
2b6af3768927        openshift/origin-pod:v3.11.0                                                                                   "/usr/bin/pod"           28 minutes ago      Up 28 minutes                           k8s_POD_kube-controller-manager-localhost_kube-system_dfcadfa6552711112062fbf1121a691c_2
682f0bcb533a        openshift/origin-pod:v3.11.0                                                                                   "/usr/bin/pod"           28 minutes ago      Up 28 minutes                           k8s_POD_kube-scheduler-localhost_kube-system_f903f642800a02b87385310221ffe91f_2
5e896ddf6c80        openshift/origin-pod:v3.11.0                                                                                   "/usr/bin/pod"           28 minutes ago      Up 28 minutes                           k8s_POD_master-api-localhost_kube-system_29e68324ed097a2c36aa5709e9b67154_2
842c95111ab0        openshift/origin-pod:v3.11.0                                                                                   "/usr/bin/pod"           28 minutes ago      Up 28 minutes                           k8s_POD_master-etcd-localhost_kube-system_34b17db69b2b3877c9904b5340f1ae71_0
6f0725a02a9a        openshift/origin-node:v3.11.0                                                                                  "hyperkube kubelet..."   28 minutes ago      Up 28 minutes                           origin

the kube-apiserver container does not even start. and the base_dir/kube-apiserver/master-config.yaml file was also empty.

stuffandting · 2018-12-17T11:59:21Z

Recently come across this issue having just started to use Minishift. Until a more stable fix is implemented upstream, thought I'd leave the workaround I'm using in case it helps anyone in the meantime.

Once the Minishift VM is available (after "Starting Minishift VM ...." completes) but before "Starting OpenShift cluster ...", execute the following one-liner: -

minishift ssh -- "F=/var/lib/minishift/base/static-pod-manifests/apiserver.yaml ; if [ -f $F ]; then rm $F ; fi ; while [ ! -f $F ]; do sleep 2 ; done ; sleep 2 ; cat $F | awk '{print}/livenessProbe:/{print \"      initialDelaySeconds: 900\"}' > /tmp/config.tmp ; mv /tmp/config.tmp $F ; cat $F"

This removes apiserver.yaml if it already exists, waits for it to be recreated, then adds the initialDelaySeconds configuration so the timeout issue isn't hit.

I'm using this on Windows 7/VirtualBox but no reason it shouldn't work on any affected platfor.

imcsk8 · 2019-01-09T03:23:30Z

This problem manifests a little different in 3.11 even with a 15 minute timeout (which BTW is hardcoded to 5 minutes) still fails.

https://github.com/imcsk8/origin/blob/a871de40a85f04cba9e5cf4cd1ff7781db4cce04/pkg/oc/clusteradd/componentinstall/readiness_apigroup.go#L20-L22

I0108 19:42:05.619997   26010 readiness_apigroup.go:45] waiting for readiness: v1.user.openshift.io v1beta1.APIServiceCondition{Type:"Available", Status:"False", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63682581813, loc:(*time.Location)(0x49213c0)}}, Reason:"MissingEndpoints", Message:"endpoints for service/api in \"openshift-apiserver\" have no addresses"}
I0108 19:42:05.620038   26010 readiness_apigroup.go:54] waiting for readiness: []string{"v1.apps.openshift.io", "v1.authorization.openshift.io", "v1.build.openshift.io", "v1.image.openshift.io", "v1.network.openshift.io", "v1.oauth.openshift.io", "v1.project.openshift.io", "v1.quota.openshift.io", "v1.route.openshift.io", "v1.security.openshift.io", "v1.template.openshift.io", "v1.user.openshift.io"}
Error: timed out waiting for the condition

journalctl logs show that it appears to be an access problem to the API server:

ene 08 20:09:00 cloud dockerd-current[29032]: E0109 03:09:00.753723       1 reflector.go:136] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:101: Failed to list *quota.ClusterResourceQuota: the server is currently unable to handle the request (get clusterresourcequotas.quota.openshift.io)
ene 08 20:09:00 cloud dockerd-current[29032]: E0109 03:09:00.768997       1 reflector.go:136] github.com/openshift/origin/pkg/security/generated/informers/internalversion/factory.go:101: Failed to list *security.SecurityContextConstraints: the server is currently unable to handle the request (get securitycontextconstraints.security.openshift.io)
ene 08 20:09:00 cloud dockerd-current[29032]: E0109 03:09:00.770277       1 reflector.go:136] github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: Failed to list *v1.OAuthClient: the server is currently unable to handle the request (get oauthclients.oauth.openshift.io)
ene 08 20:09:00 cloud dockerd-current[29032]: E0109 03:09:00.808739       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Service: Get https://172.30.0.1/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: no route to host

@knobunc could you give me a hand diagnosing this problem?

imcsk8 · 2019-01-10T07:27:45Z

After some testing i found that iptables rules can interfere with the oc cluster up execution, so i created a little script [1] that outlines part of the recommended best practices in the manual [2]

[1] https://github.com/imcsk8/origin-tools/blob/master/run-oc-cluster-up.sh
[2] https://docs.okd.io/latest/getting_started/administrators.html

agajdosi · 2019-01-14T09:31:11Z

@imcsk8 Thank you for investigating it. Unfortunately I use the cluster-up via Minishift and the issue sometimes happen, sometimes not even though the OS image on which it starts is the same every time. So I am not sure whether the problem really lies in iptables.

agajdosi · 2019-01-14T09:41:03Z

iptables rules on Minishift/CDK images:

[docker@minishift ~]$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-NODEPORT-NON-LOCAL  all  --  anywhere             anywhere             /* Ensure that non-local NodePort traffic can flow */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FORWARD (policy DROP)
target     prot opt source               destination         
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
DOCKER-ISOLATION  all  --  anywhere             anywhere            
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-EXTERNAL-SERVICES (1 references)
target     prot opt source               destination         

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x1/0x1

Chain KUBE-NODEPORT-NON-LOCAL (1 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (1 references)
target     prot opt source               destination

agajdosi · 2019-02-12T13:46:35Z

Just to mention. This issue is now blocker for CDK 3.8.0 on Windows 10 (https://issues.jboss.org/browse/CDK-389). Suggested fix through ip tables does not work.

co-de · 2019-05-02T15:03:48Z

I had done everything according to the described procedures, including setting up the firewall zone as described here: https://github.com/openshift/origin/blob/release-3.11/docs/cluster_up_down.md. I was still getting this API server error: Get https://XXX.XXX.XXX.XXX:8443/healthz?timeout=32s: dial tcp XXX.XXX.XXX.XXX:8443: getsockopt: connection refused ()
Error: timed out waiting for the condition error while trying to run "oc cluster up" on a CentOS7 VM on MacOS. Solution for me was to allocate more RAM and CPU to the guest OS.

MaheshZ · 2019-08-29T09:00:39Z

I found that this was because of no connectivity to the internet. Although I do not know why it would fail, or give such an error while failing. Probably tries pulling something from dockerhub and fails.

Asgoret · 2019-12-06T08:02:48Z

Same issue in full OKD installation 3.11.156-1. In my case, etcd can't connect to each other due connection refused, but I do the same playbook several times and all was good.

jwforres assigned mfojtik Aug 21, 2018

openshift-ci-robot added the sig/master label Aug 21, 2018

This was referenced Sep 5, 2018

Cluster up fails with getsockopt: connection refused () minishift/minishift#2675

Closed

"getsockopt: connection refused" error on mac minishift/minishift#1746

Closed

agajdosi mentioned this issue Oct 4, 2018

Minishift startup always times out on Windows 10/VirtualBox 5.2.18 minishift/minishift#2852

Closed

mfojtik added priority/P3 component/cluster-up labels Nov 19, 2018

amitkrout mentioned this issue Dec 24, 2018

oc cluster up: Error: timed out waiting for the condition #21420

Closed

imcsk8 self-assigned this Jan 8, 2019

imcsk8 closed this as completed Jan 10, 2019

amitkrout mentioned this issue Jan 30, 2019

Move e2e tests to ci.centos.org redhat-developer/odo#952

Closed

agajdosi mentioned this issue Feb 28, 2019

Cluster-up fails due to failed openshift3/ose-hypershift container #22194

Closed

cluster up fails with "getsockopt: connection refused ()" #20617

cluster up fails with "getsockopt: connection refused ()" #20617

Comments

agajdosi commented Aug 13, 2018 • edited Loading

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

AIKiller commented Aug 16, 2018

agajdosi commented Aug 16, 2018

jwforres commented Aug 21, 2018

jdbarfield commented Sep 10, 2018

pgfaller commented Sep 12, 2018 • edited Loading

moodboom commented Sep 18, 2018 • edited Loading

bill0425 commented Sep 20, 2018

engkhun commented Sep 29, 2018

agajdosi commented Oct 3, 2018 • edited Loading

juanvallejo commented Oct 3, 2018

agajdosi commented Oct 15, 2018 • edited Loading

nstielau commented Oct 15, 2018

wopalka commented Oct 15, 2018

arnaud-deprez commented Oct 20, 2018 • edited Loading

nstielau commented Oct 22, 2018

moanrose commented Oct 25, 2018 • edited Loading

lovoni commented Oct 26, 2018

nstielau commented Nov 1, 2018

austincunningham commented Nov 2, 2018 • edited Loading

amitkrout commented Nov 13, 2018 • edited Loading

agajdosi commented Nov 14, 2018

openshift-ci-robot commented Nov 14, 2018

odockal commented Nov 14, 2018

stianst commented Nov 15, 2018

nstielau commented Nov 15, 2018

jdbarfield commented Nov 15, 2018

menza commented Nov 16, 2018

odockal commented Nov 16, 2018

menza12 commented Nov 16, 2018

anjannath commented Nov 19, 2018

stuffandting commented Dec 17, 2018

imcsk8 commented Jan 9, 2019 • edited Loading

imcsk8 commented Jan 10, 2019

agajdosi commented Jan 14, 2019

agajdosi commented Jan 14, 2019

agajdosi commented Feb 12, 2019

co-de commented May 2, 2019

MaheshZ commented Aug 29, 2019

Asgoret commented Dec 6, 2019

agajdosi commented Aug 13, 2018 •

edited

Loading

pgfaller commented Sep 12, 2018 •

edited

Loading

moodboom commented Sep 18, 2018 •

edited

Loading

agajdosi commented Oct 3, 2018 •

edited

Loading

agajdosi commented Oct 15, 2018 •

edited

Loading

arnaud-deprez commented Oct 20, 2018 •

edited

Loading

moanrose commented Oct 25, 2018 •

edited

Loading

austincunningham commented Nov 2, 2018 •

edited

Loading

amitkrout commented Nov 13, 2018 •

edited

Loading

imcsk8 commented Jan 9, 2019 •

edited

Loading