-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify nodes to register directly with the master. #6949
Conversation
/cc @bgrant0607 |
@@ -165,12 +165,14 @@ func (h *EtcdHelper) ExtractObjToList(key string, listObj runtime.Object) error | |||
} | |||
|
|||
nodes := make([]*etcd.Node, 0) | |||
nodes = append(nodes, response.Node) | |||
if !IsEtcdNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I found that if there is an etcd not found error here (type 100) then response will be null so response.Node or response.EtcdIndex (below) will cause this goroutine to panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derek just fixed this in another pull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#6938 covers it
On Apr 16, 2015, at 9:30 PM, Robert Bailey notifications@github.com wrote:
In pkg/tools/etcd_helper.go:
@@ -165,12 +165,14 @@ func (h *EtcdHelper) ExtractObjToList(key string, listObj runtime.Object) error
}nodes := make([]*etcd.Node, 0)
- nodes = append(nodes, response.Node)
- if !IsEtcdNotFound(err) {
@smarterclayton I found that if there is an etcd not found error here (type 100) then response will be null so response.Node or response.EtcdIndex (below) will cause this goroutine to panic.—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rebased on top of that PR.
@derekwaynecarr @pires @justinsb I haven't been able to test this on other cloud providers -- can you take a look and see if you think it will break anything? |
What are the security implications of this? How do I control as an admin clients interfering with other nodes? Does this require the cloud provider on the kubelet to have security credentials to the cluster (very scary)? Can I still disable the cloud provider on the kubelet? |
Security implications: A node with credentials can add itself to the cluster. Right now, nodes can do this because they have bearer tokens. Right now we don't have any authz in the apiserver. Once we do, we should add policies that only allow nodes to update their own data (and not that of other nodes). Any node today has free reign to update anything in the APIserver because it has full read/write access through the secured port with the node bearer token. As a first cut I just moved the cloud provider bits from the nodecontroller to the kubelet. But I think we should be able to remove most/all of them. For GCE (and likely AWS) we may still want make a call into the cloud provider to hit the local metadata server and get the external IP address of a node. But calling into the cloud provider to determine system resources seems silly. Yes, you can still disable the cloud provider on the kubelet. When you run |
|
fcae414
to
d5da62c
Compare
// syncNodeStatus periodically synchronizes node status to master. | ||
func (kl *Kubelet) syncNodeStatus() { | ||
if kl.kubeClient == nil { | ||
return | ||
} | ||
if err := kl.registerNode(); err != nil { | ||
glog.Errorf("Failed to register node: %s. Giving up.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use util.HandleError here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that existed -- cool. I've passed it a new error so that we keep the custom text on top of the error text returned from registerNode.
@roberthbailey right now I can only test on GCE since I no longer use AWS. Now, from what I understand from your changes, you're moving cloud-provider stuff from After reading @smarterclayton I too believe there are risks, but right now, as a sysadmin, I'm OK with it. |
@pires I'm trying to move the node creation from being done in a non-obvious way (a control loop that talks to the cloud provider and uses a regular expression to decide which nodes should be part of the cluster) to an explicit call from a node to join a specific master. Right now this call "just works" because the kubelet already has credentials to talk to the master, but in the future (with dynamic clustering) the node will first ask to join by sending a CSR to the master to get itself credentials to POST and create a node entry. This will give sysadmins the ability to set policies on whether nodes can join automatically or whether they need to be manually approved to join a cluster. But that's post-1.0. For now, if a node has credentials, it will automatically join the cluster. Some of the fields for the node may still make sense to fill in from the node controller (which is why I cc'd @bgrant0607 as I thought he'd have an opinion here) but I think that most of the ones we are setting are better known by the kubelet itself (e.g. node resources, hostname, ip addresses). |
d5da62c
to
75c6d7a
Compare
@roberthbailey just to be sure, this would remove the need for tools such as |
@pires I think so. Would it be possible for you to test this CL on CoreOS and find out? |
@roberthbailey given the nature of this @kelseyhightower and his pals at CoreOS are the most abilitated to postulate about this. anyway i may be able to test this CL early next week... |
@roberthbailey sure will you be available on IRC for debugging? |
I'll hop on right now. |
Removed
|
Nodes shouldn't need to call the cloud provider. We have discussed making it possible to contact the local metadata server for clouds that provide that, but I'd like to avoid that, since it's not universally available. We need to make some changes to the Node object. The node name is currently overloaded for at least 2 purposes: object name and cloud provider ("external") ID. It's often set to the hostname or address, which adds to the confusion about the name requirements and usage. The only information currently required to create a node is the externalID, which is automatically defaulted to the node name if not provided -- that part I don't mind. However, I'd like the externalID to be used to lookup the node with the cloud provider, rather than looking up the external ID using the node name: Addresses are also looked up via the cloud provider. It's true that the node knows at least a subset of its addresses, though not necessarily external ones. The node controller could fill in additional addresses after creation (late initialization). We don't really have a good way of knowing which address is preferred for master-to-node communication (/validate, log access, exec, port forwarding, proxy, redirect), and we don't know which name/address is specified in the node cert, to the extent that's relevant -- #6985. Some master component will allocate the PodCIDR via late initialization. Capacity can be populated after creation, also. |
This should also allow us to get rid of the --machines flag, which @eparis should also like. kube-register allows users to specify fleet node labels to select which nodes to add to Kubernetes. That's still useful, so I don't think this obsoletes kube-register. I wouldn't mind adding fleet/kube-register as a cloudprovider -- #2890 @kelseyhightower . |
So maybe the one cloudprovider bit the node needs is the ability to get its externalID, which it should get from a local metadata server, or maybe just use the hostname if no cloudprovider. |
ObjectMeta: api.ObjectMeta{Name: kl.hostname}, | ||
Status: api.NodeStatus{ | ||
Capacity: api.ResourceList{ | ||
api.ResourceCPU: *resource.NewMilliQuantity(1000, resource.DecimalSI), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet already has code for populating this with real values.
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubelet/kubelet.go#L1768
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ask the cloud provider to determine the node resources when the node can just self report them? How will that work for deployments without a cloud provider?
Also, see https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/cloudprovider/gce/gce.go#L512
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No good reason -- legacy cruft. We must get the values from the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code was in NodeController so there would be some capacity in NodeStatus between the time the node is created in the master, and the first time the node reports its status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with leaving it this way in this PR, but we should clean up at some point. Please leave a TODO if you don't think you can get to it soon.
Could this be done instead by using fleet node labels to choose which --api_server argument we pass to the kubelet on each node? It doesn't make sense for a kubelet to connect to an apiserver and watch for pods to run (and send events) if it isn't actually part of the cluster. |
"time" | ||
|
||
"github.com/GoogleCloudPlatform/kubernetes/pkg/api" | ||
apierrors "github.com/GoogleCloudPlatform/kubernetes/pkg/api/errors" | ||
"github.com/GoogleCloudPlatform/kubernetes/pkg/client" | ||
"github.com/GoogleCloudPlatform/kubernetes/pkg/client/record" | ||
"github.com/GoogleCloudPlatform/kubernetes/pkg/cloudprovider" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to delete the cloudprovider here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I re-added the code to handle deleting nodes that no longer exist, we are still using the cloud provider.
Actually, I'm liking this more. I think we should eliminate cloudprovider stuff from the nodecontroller and move it out of the cloudprovider subtree, as discussed in #4851. The node should get any info it needs from the metadata server, if present, or from a local config file, fleet, whatever -- the mechanism should be pluggable. Calling the existing cloudprovider API could be a stopgap. We'll likely need to resurrect a cloudprovider-oriented nodecontroller in the future in order to automatically repair and provision nodes (cluster auto-scaling), but that's ok. |
Just finished running e2e tests:
|
(Needs rebase) |
- Delete nodes when they are no longer ready and don't exist in the cloud provider. - Label each node with it's hostname. - Add flag to skip node registration. - Add a test for registering an existing node.
394c419
to
8e356f8
Compare
Rebased again. |
reviewing now |
Thanks @erictune |
LGTM |
Modify nodes to register directly with the master.
/xref should fix #8315 (linking for tracking) |
availableCIDRs.Delete(node.Spec.PodCIDR) | ||
// reconcilePodCIDRs looks at each node and assigns it a valid CIDR | ||
// if it doesn't currently have one. | ||
func (nc *NodeController) reconcilePodCIDRs(nodes *api.NodeList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be called reconcile_Node_CIDRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probably should have renamed it while I was in here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref #6087.
Fixes #7495.