-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault in controllermanager #63172
Comments
not sure what sig should be labeled in on this. Seems like a bug in sig-azure and whoever owns the piece getting the nil pointer/panic |
@JackQuincy what's the k8s version? |
@andyzhangx 1.9.1 |
@JackQuincy - I suspect that the error is here. https://github.com/kubernetes/kubernetes/blob/v1.9.1/pkg/cloudprovider/providers/azure/azure_loadbalancer.go#L326 This can be either one of two things
Can you dump the model for this internal LB here? Also -- did this error eventually recover? (it could be a race condition between setting ip and getting ip) |
@khenidak this was actually a customer cluster and they deleted it before I got a chance to get things like the model/what error code on the 400 was. And it didn't ever recover. |
@khenidak according to the evente table it was an internal loadbalancer |
I was able to get more details from logs, |
Thanks @JackQuincy - So there are two errors here. One the user can not use more than 10 IPs. The fact that we panic on this. i believe the fix is to log and fail the operation. /assign @feiskyer |
@JackQuincy @khenidak The issue has been fixed in #59083 and has been included in v1.9.2. Could you upgrade your cluster (e.g. to latest v1.9.7) and check whether it fixes the issue? |
I'm closing this issue since it's been resolved. /close |
I am seeing this on 1.10.x and 1.11.x clusters: 1.10.6
1.11.2
|
@feiskyer: Reopening this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Added this to v1.12 since this is a critical urgent bug. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
controller manager crash looped with a seg fault after failing to create a load balancer for the kubernetes dashboard. The calls to create the load balancer were failing with a 400
What you expected to happen:
controller manager to provision the resources or to fail without a segfault and a clearer error message saying what the error creating the resource was.
How to reproduce it (as minimally and precisely as possible):
Just created a cluster with an add on for the dashboard.
Anything else we need to know?:
logs from controller manager:
I0423 12:50:28.532604 18125 range_allocator.go:157] Starting range CIDR allocator
I0423 12:50:28.532635 18125 controller_utils.go:1019] Waiting for caches to sync for cidrallocator controller
I0423 12:50:28.532669 18125 taint_controller.go:181] Starting NoExecuteTaintManager
I0423 12:50:28.532732 18125 node_controller.go:611] Initializing eviction metric for zone: westeurope: :0
W0423 12:50:28.532832 18125 node_controller.go:964] Missing timestamp for Node aks-nodepool1-38057350-0. Assuming now as a timestamp.
I0423 12:50:28.532938 18125 node_controller.go:880] Controller detected that zone westeurope: :0 is now in state Normal.
I0423 12:50:28.533081 18125 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"aks-nodepool1-38057350-0", UID:"9fbde49a-4476-11e8-9b1c-0a58ac1f0e15", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node aks-nodepool1-38057350-0 event: Registered Node aks-nodepool1-38057350-0 in Controller
I0423 12:50:28.533753 18125 controller_utils.go:1026] Caches are synced for ClusterRoleAggregator controller
I0423 12:50:28.535278 18125 controller_utils.go:1026] Caches are synced for endpoint controller
I0423 12:50:28.536974 18125 controller_utils.go:1026] Caches are synced for attach detach controller
E0423 12:50:28.537377 18125 attach_detach_controller.go:345] Error creating spec for volume "mssqldb", pod "feature-test3"/"mssqlserver-6d45f8cf58-r5r7l": error processing PVC "feature-test3"/"mssql-data": PVC feature-test3/mssql-data has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("")
I0423 12:50:28.538422 18125 controller_utils.go:1026] Caches are synced for deployment controller
I0423 12:50:28.541231 18125 controller_utils.go:1026] Caches are synced for persistent volume controller
I0423 12:50:28.553734 18125 controller_utils.go:1026] Caches are synced for ReplicationController controller
I0423 12:50:28.557127 18125 controller_utils.go:1026] Caches are synced for TTL controller
I0423 12:50:28.632899 18125 controller_utils.go:1026] Caches are synced for cidrallocator controller
E0423 12:50:28.765043 18125 service_controller.go:776] Failed to process service kube-system/kubernetes-dashboard. Retrying in 5s: error getting LB for service kube-system/kubernetes-dashboard: Service(kube-system/kubernetes-dashboard) - Loadbalancer not found
I0423 12:50:28.765606 18125 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"kubernetes-dashboard", UID:"bf653ab4-4476-11e8-9b1c-0a58ac1f0e15", APIVersion:"v1", ResourceVersion:"193", FieldPath:""}): type: 'Warning' reason: 'CreatingLoadBalancerFailed' Error creating load balancer (will retry): error getting LB for service kube-system/kubernetes-dashboard: Service(kube-system/kubernetes-dashboard) - Loadbalancer not found
I0423 12:50:28.765638 18125 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"feature-initialproject", Name:"identityserver-service", UID:"2ec2df0f-4485-11e8-9b1c-0a58ac1f0e15", APIVersion:"v1", ResourceVersion:"7879", FieldPath:""}): type: 'Normal' reason: 'EnsuringLoadBalancer' Ensuring load balancer
I0423 12:50:28.806251 18125 controller_utils.go:1026] Caches are synced for namespace controller
I0423 12:50:28.809049 18125 controller_utils.go:1026] Caches are synced for service account controller
I0423 12:50:28.844247 18125 controller_utils.go:1026] Caches are synced for resource quota controller
I0423 12:50:28.852639 18125 controller_utils.go:1026] Caches are synced for resource quota controller
I0423 12:50:28.859159 18125 controller_utils.go:1026] Caches are synced for garbage collector controller
I0423 12:50:28.859180 18125 garbagecollector.go:144] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I0423 12:50:28.859261 18125 controller_utils.go:1026] Caches are synced for garbage collector controller
I0423 12:50:28.908094 18125 controller_utils.go:1026] Caches are synced for stateful set controller
I0423 12:50:28.919567 18125 controller_utils.go:1026] Caches are synced for disruption controller
I0423 12:50:28.919591 18125 disruption.go:296] Sending events to api server.
I0423 12:50:29.921277 18125 azure_loadbalancer.go:242] selectLoadBalancer: cluster(kubernetes) service(feature-initialproject/identityserver-service) isInternal(true) - availabilitysetsnames [nodepool1-availabilitySet-38057350]
E0423 12:50:30.504721 18125 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:367
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure_loadbalancer.go:326
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure_loadbalancer.go:113
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:374
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:306
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:249
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:771
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:213
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:217
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:195
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d5772d]
goroutine 1338 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x111
panic(0x40b2e00, 0xa603590)
/usr/local/go/src/runtime/panic.go:491 +0x283
k8s.io/kubernetes/pkg/cloudprovider/providers/azure.(*Cloud).getServiceLoadBalancerStatus(0xc4206cdc00, 0xc422b401e0, 0xc4208f6b90, 0xc422b401e0, 0xc4210c0388, 0x1)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure_loadbalancer.go:326 +0x30d
k8s.io/kubernetes/pkg/cloudprovider/providers/azure.(*Cloud).EnsureLoadBalancer(0xc4206cdc00, 0x496be8b, 0xa, 0xc422b401e0, 0xc4210c0388, 0x1, 0x1, 0xbeaf93852d9a680b, 0x2fe0fa757, 0xa9f0d40)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure_loadbalancer.go:113 +0x2ea
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).ensureLoadBalancer(0xc420f56000, 0xc422b401e0, 0xc422b401e0, 0x4961da9, 0x6)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:374 +0xcc
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).createLoadBalancerIfNeeded(0xc420f56000, 0xc421fd81e0, 0x2d, 0xc422b401e0, 0xc422c8dc40, 0xc422c8dc78, 0x1265492)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:306 +0x20e
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).processServiceUpdate(0xc420f56000, 0xc4217bcc60, 0xc422b401e0, 0xc421fd81e0, 0x2d, 0x0, 0x0, 0x0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:249 +0xeb
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).syncService(0xc420f56000, 0xc421fd81e0, 0x2d, 0x0, 0x0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:771 +0x3aa
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).worker.func1(0xc420f56000)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:213 +0xd9
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).worker(0xc420f56000)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:217 +0x2b
k8s.io/kubernetes/pkg/controller/service.(*ServiceController).(k8s.io/kubernetes/pkg/controller/service.worker)-fm()
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:195 +0x2a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc420a04570)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x5e
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420a04570, 0x3b9aca00, 0x0, 0x1, 0xc4208fa0c0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc420a04570, 0x3b9aca00, 0xc4208fa0c0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by k8s.io/kubernetes/pkg/controller/service.(*ServiceController).Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/service/service_controller.go:195 +0x20c
Environment:
kubectl version
): 1.9.1uname -a
): xenial@khenidak
The text was updated successfully, but these errors were encountered: