Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure ARM client might segfault on empty responses #94077

Closed
bpineau opened this issue Aug 18, 2020 · 1 comment · Fixed by #94078
Closed

Azure ARM client might segfault on empty responses #94077

bpineau opened this issue Aug 18, 2020 · 1 comment · Fixed by #94078
Labels
area/provider/azure Issues or PRs related to azure provider kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.
Milestone

Comments

@bpineau
Copy link
Contributor

bpineau commented Aug 18, 2020

What happened:

We're seeing legacy-cloud-providers/azure/clients (in our case, the synchronised copy used by latest cluster-autoscaler (1.19)) segfaulting under heavy pressure and ARM throttling:

I0809 17:11:56.963285      49 azure_cache.go:83] Invalidating unowned instance cache
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1b595b5]
cluster-autoscaler-all-79b9478bf5-cgkg8 cluster-autoscaler
goroutine 82 [running]:
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/armclient.(*Client).Send(0xc00052f520, 0x3b6dc40, 0xc000937440, 0xc000933f00, 0x0, 0x0)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/armclient/azure_armclient.go:122 +0xb5
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/armclient.(*Client).GetResource(0xc00052f520, 0x3b6dc40, 0xc000937440, 0xc000998630, 0x82, 0x0, 0x0, 0xc000d0f8b0, 0xb)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/armclient/azure_armclient.go:312 +0x3a0
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssclient.(*Client).listVMSS(0xc000758700, 0x3b6dc40, 0xc000937440, 0xc000d0f8b0, 0xb, 0x0, 0x0, 0x0, 0x0)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssclient/azure_vmssclient.go:181 +0x316
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssclient.(*Client).List(0xc000758700, 0x3b6dc40, 0xc000937440, 0xc000d0f8b0, 0xb, 0x57517d7, 0x57, 0x13d056b, 0x1c5b8c1)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssclient/azure_vmssclient.go:158 +0x331
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure.(*AzureManager).listScaleSets(0xc000a1f200, 0xc00091afe0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure/azure_manager.go:646 +0xe0
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure.(*AzureManager).getFilteredAutoscalingGroups(0xc000a1f200, 0xc00091afe0, 0x1, 0x1, 0x8, 0x8199ee, 0x5834f80, 0x8, 0x0)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure/azure_manager.go:626 +0x194
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure.(*AzureManager).fetchAutoAsgs(0xc000a1f200, 0x2, 0x2)
     /home/jb/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure/azure_manager.go:549 +0x67
...

In that specific case the cluster-autoscaler triggered the issue, but it can likely affect other ARM clients using that code, including in-tree kubelet or controller-manager.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.18.8
  • Cloud provider or hardware configuration: Azure
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04.1 LTS
  • Kernel (e.g. uname -a): 5.4.0-1022-azure add CONTRIB.md #22-Ubuntu SMP
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

/kind bug
/sig cloud-provider
/area provider/azure

@bpineau bpineau added the kind/bug Categorizes issue or PR as related to a bug. label Aug 18, 2020
@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. area/provider/azure Issues or PRs related to azure provider labels Aug 18, 2020
@craiglpeters
Copy link

/milestone v1.20

@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/azure Issues or PRs related to azure provider kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants