Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Azure CloudProvider Implementation #28821

Merged

Conversation

colemickens
Copy link
Contributor

This PR adds Azure as a cloudprovider provider for Kubernetes. It specifically adds support for native pod networking (via Azure User Defined Routes) and L4 Load Balancing (via Azure Load Balancers).

I did have to add clusterName as a parameter to the LoadBalancers methods. This is because Azure only allows one "LoadBalancer" object per set of backend machines. This means a single "LoadBalancer" object must be shared across the cluster. The "LoadBalancer" is named via the cluster-name parameter passed to kube-controller-manager so as to enable multiple clusters per resource group if the user desires such a configuration.

There are few things that I'm a bit unsure about:

  1. The implementation of the Instances interface. It's not extensively documented, it's not really clear what the different functions are used for, and my questions on the ML didn't get an answer.
  2. Counter to the comments on the LoadBalancers Interface, I modify the api.Service object in EnsureLoadBalancerDeleted, but not with the intention of affecting Kube's view of the Service. I simply do it so that I can remove the Ports on the Service object and then re-use my reconciliation logic that can handle removing stale/deleted Ports.
  3. The logging is a bit verbose. I'm looking for guidance on the appropriate log level to use for the chattier bits.

Due to the (current) lack of Instance Metadata Service and lack of Virtual Machine Identity in Azure, the user is required to do a few things to opt-in to this provider. These things are called-out as they are in contrast to AWS/GCE:

  1. The user must provision an Azure Active Directory ServicePrincipal with Contributor level access to the resource group that the cluster is deployed in. This creation process is documented by Hashicorp or on the MSDN Blog.
  2. The user must place a JSON file somewhere on each Node that conforms to the AzureConfig struct defined in azure.go. (This is automatically done in the Azure flavor of Kubernetes-Anywhere.)
  3. The user must specify --cloud-config=/path/to/azure.json as an option to kube-apiserver and kube-controller-manager similarly to how the user would need to pass --cloud-provider=azure.

I've been running approximately this code for a month and a half. I only encountered one bug which has since been fixed and covered by a unit test. I've just deployed a new cluster (and a Type=LoadBalancer nginx Service) using this code (via kubernetes-anywhere) and have posted the kube-controller-manager logs for anyone who is interested in seeing the logs of the logic.

If you're interested in this PR, you can use the instructions in my azure-kubernetes-demo repository to deploy a cluster with minimal effort via kubernetes-anywhere. (There is currently a pending PR in kubernetes-anywhere that is needed in conjuncture with this PR). I also have a pre-built hyperkube image: docker.io/colemickens/hyperkube-amd64:v1.4.0-alpha.0-azure, which will be kept in sync with the branch this PR stems from.

I'm hoping this can land in the Kubernetes 1.4 timeframe.

CC (potential code reviewers from Azure): @ahmetalpbalkan @brendandixon @paulmey

CC (other interested Azure folk): @brendandburns @johngossman @anandramakrishna @jmspring @jimzim

CC (others who've expressed interest): @codefx9 @edevil @thockin @rootfs

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Jul 12, 2016
@brendandburns brendandburns self-assigned this Jul 12, 2016
@@ -81,21 +81,21 @@ type LoadBalancer interface {
// GetLoadBalancer returns whether the specified load balancer exists, and
// if so, what its status is.
// Implementations must treat the *api.Service parameter as read-only and not modify it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document the new parameter please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did document it on each method on the Interface, github just didn't notice properly here.

@colemickens
Copy link
Contributor Author

  • Factored out the find{Probe,Rule,SecurityRule} functions as suggested.
  • Removed a couple other noisy log lines.
  • Fixed the other small unnecessary variable.
  • Implemented Instances List

Looks like the build is green as well.

@rootfs
Copy link
Contributor

rootfs commented Jul 26, 2016

@colemickens @brendandburns when do we expect this can be merged? thanks.

@brendandburns
Copy link
Contributor

LGTM. Thanks for the patience.

@brendandburns brendandburns added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 26, 2016
@brendandburns brendandburns changed the title Azure CloudProvider Implementation Add an Azure CloudProvider Implementation Jul 26, 2016
@mikedanese
Copy link
Member

Can you squash fix-up commits @colemickens?

@colemickens colemickens force-pushed the azure-cloudprovider-pr branch from 797e30d to e1f30cc Compare July 26, 2016 21:49
@colemickens colemickens force-pushed the azure-cloudprovider-pr branch from e1f30cc to 2ebffb4 Compare July 26, 2016 21:50
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 26, 2016
@k8s-bot
Copy link

k8s-bot commented Jul 26, 2016

GCE e2e build/test passed for commit e1f30cc71119089bd1eead9911ef5f35cf8af8eb.

@colemickens
Copy link
Contributor Author

colemickens commented Jul 26, 2016

@brendandburns @mikedanese Rebase, squashed appropriately. Build looks green, but the LGTM label got dropped.

(Thanks for the patience as well, my turnaround on feedback was slower than I might've hoped.)

@brendandburns brendandburns added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 26, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jul 27, 2016

GCE e2e build/test passed for commit 2ebffb4.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@edevil
Copy link
Contributor

edevil commented Jul 27, 2016

Great work!

@mikedanese
Copy link
Member

Thanks @colemickens, this looks great

@colemickens colemickens deleted the azure-cloudprovider-pr branch July 27, 2016 20:59
@thockin
Copy link
Member

thockin commented Aug 19, 2016

This has plumbed clusterName throughout, but what is the definition of
cluster name? Is it guaranteed to be set? What format and length is it?
How does it get set?

On Tue, Jul 26, 2016 at 6:42 AM, Huamin Chen notifications@github.com
wrote:

@colemickens https://github.com/colemickens @brendandburns
https://github.com/brendandburns when do we expect this can be merged?
thanks.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28821 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVPx407fRzyld35ofWM3QBg8k4S7Uks5qZg6ugaJpZM4JKELW
.

@colemickens
Copy link
Contributor Author

@thockin: the clusterName is the user specified --cluster-name given to kube-controller-manager. It defaults to kubernetes. It is used for naming the load balancer in Azure so that two Kubernetes deployments could live side-by-side in a single Azure Resource Group. Since we don't create Azure Load Balancers per-service, and instead share one for the cluster, I needed some sort of name to be unique per cluster and chose clusterName.

This clusterName was an existing parameter used in the Routes interface.

@thockin
Copy link
Member

thockin commented Aug 19, 2016

huh, right you are. I didn't know we had such a flag. Thanks!

On Thu, Aug 18, 2016 at 9:06 PM, Cole Mickens notifications@github.com
wrote:

@thockin https://github.com/thockin: the clusterName is the user
specified --cluster-name given to kube-controller-manager. It defaults to
kubernetes. It is used for naming the load balancer in Azure so that two
Kubernetes deployments could live side-by-side in a single Azure Resource
Group. Since we don't create Azure Load Balancers per-service, and instead
share one for the cluster, I needed some sort of name to be unique per
cluster and chose clusterName.

This clusterName was an existing parameter used in the Routes interface.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28821 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGXbyxQ7V9N8v7vODNGG8zGobSdyks5qhSvEgaJpZM4JKELW
.

@jadbox
Copy link

jadbox commented Aug 30, 2016

How does this interact with the Azure VMSS cluster that may sit behind the load balancer? E.g. will I be able to tell the cluster to scale VMs up/down within kubernetes?

@jadbox
Copy link

jadbox commented Aug 30, 2016

Never mind, I read the scope doc that says VMSS will not be considered: https://github.com/colemickens/azure-kubernetes-status

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
…der-pr

Automatic merge from submit-queue

Add an Azure CloudProvider Implementation

This PR adds `Azure` as a cloudprovider provider for Kubernetes. It specifically adds support for native pod networking (via Azure User Defined Routes) and L4 Load Balancing (via Azure Load Balancers).

I did have to add `clusterName` as a parameter to the `LoadBalancers` methods. This is because Azure only allows one "LoadBalancer" object per set of backend machines. This means a single "LoadBalancer" object must be shared across the cluster. The "LoadBalancer" is named via the `cluster-name` parameter passed to `kube-controller-manager` so as to enable multiple clusters per resource group if the user desires such a configuration.

There are few things that I'm a bit unsure about:

1. The implementation of the `Instances` interface. It's not extensively documented, it's not really clear what the different functions are used for, and my questions on the ML didn't get an answer.

2. Counter to the comments on the `LoadBalancers` Interface, I modify the `api.Service` object in `EnsureLoadBalancerDeleted`, but not with the intention of affecting Kube's view of the Service. I simply do it so that I can remove the `Port`s on the `Service` object and then re-use my reconciliation logic that can handle removing stale/deleted Ports. 

3. The logging is a bit verbose. I'm looking for guidance on the appropriate log level to use for the chattier bits.

Due to the (current) lack of Instance Metadata Service and lack of Virtual Machine Identity in Azure, the user is required to do a few things to opt-in to this provider. These things are called-out as they are in contrast to AWS/GCE:

1. The user must provision an Azure Active Directory ServicePrincipal with `Contributor` level access to the resource group that the cluster is deployed in. This creation process is documented [by Hashicorp](https://www.packer.io/docs/builders/azure-setup.html) or [on the MSDN Blog](https://blogs.msdn.microsoft.com/arsen/2016/05/11/how-to-create-and-test-azure-service-principal-using-azure-cli/).

2. The user must place a JSON file somewhere on each Node that conforms to the `AzureConfig` struct defined in `azure.go`. (This is automatically done in the Azure flavor of [Kubernetes-Anywhere](https://github.com/kubernetes/kubernetes-anywhere).)

3. The user must specify `--cloud-config=/path/to/azure.json` as an option to `kube-apiserver` and `kube-controller-manager` similarly to how the user would need to pass `--cloud-provider=azure`.

I've been running approximately this code for a month and a half. I only encountered one bug which has since been fixed and covered by a unit test. I've just deployed a new cluster (and a Type=LoadBalancer nginx Service) using this code (via `kubernetes-anywhere`) and have posted [the `kube-controller-manager` logs](https://gist.github.com/colemickens/1bf6a26e7ef9484a72a30b1fcf9fc3cb) for anyone who is interested in seeing the logs of the logic.

If you're interested in this PR, you can use the instructions in my [`azure-kubernetes-demo` repository](https://github.com/colemickens/azure-kubernetes-demo) to deploy a cluster with minimal effort via [`kubernetes-anywhere`](https://github.com/kubernetes/kubernetes-anywhere). (There is currently [a pending PR in `kubernetes-anywhere` that is needed](kubernetes-retired/kubernetes-anywhere#172) in conjuncture with this PR). I also have a pre-built `hyperkube` image: `docker.io/colemickens/hyperkube-amd64:v1.4.0-alpha.0-azure`, which will be kept in sync with the branch this PR stems from.

I'm hoping this can land in the Kubernetes 1.4 timeframe.

CC (potential code reviewers from Azure): @ahmetalpbalkan @brendandixon @paulmey

CC (other interested Azure folk): @brendandburns @johngossman @anandramakrishna @jmspring @jimzim

CC (others who've expressed interest): @codefx9 @edevil @thockin @rootfs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.