-
Notifications
You must be signed in to change notification settings - Fork 82
[Federation] Federation should fully support clusters with a previous API version. #209
Comments
Comment by marun A naive attempt to support backwards compatible federation: #44131 |
Comment by quinton-hoole Maru, it's not entirely clear to me that this is not approximately "working as intended" from the client's point of view (other than the e2e tests failing of course, but that's a separate matter. And presumably this leads to an infinite update loop in the controller, and that should be fixed, but that's not very visible to the client). If this is the case, I'd suggest renaming this issue to something like: "Federation should fully support clusters with a previous API version." To make sure that I understand the root problem (and perhaps we should be having this discussion in an issue rather), by means of a concrete example:
So one ends up with an infinite update loop in the Federated ReplicaSet controller, but other than that the client sees approximately correct and expected behavior? i.e. They see a v1.5 ReplicaSet in their v1.5 Cluster consistent with the v1.6 Federated ReplicaSet, except for the fields that don't exist in v1.5? Off the top of my head, the solution seems to be:
Does this make sense? Q |
Comment by derekwaynecarr @quinton-hoole -- if the goal of federation is to federate across each member cluster, it seems that at minimum, the federation control plane must be <= to the version of each member cluster. let me provide an example. in k8s 1.6, pod spec has a field "toleration". if i send a pod spec to a federation 1.6 server that uses toleration, it will try and replicate that to each member cluster. if i have member clusters running 1.5, 1.4, etc, that field does not exist. it could mean that that field is not supplied on the copy being federated to each member cluster. i suspect many users would consider that a bug, and the safer guidance is that your control plane must be <= version of each member cluster, agree? |
Comment by quinton-hoole From: Maru Newby marun@redhat.com On Thu, Apr 6, 2017 at 10:32 AM, Quinton Hoole quinton@hoole.biz wrote:
Correct, the main problem is that the controllers would be constantly
Your intent is clear, but I don't think what you are proposing is |
Comment by quinton-hoole @marun I'm no K8s API versioning expert but @nikhiljindal is, and can hopefully confirm. Continuing our v1.5 vs v1.6 example, if a v1.5 client connects to a v1.6 cluster, all API responses are converted to v1.5 (internally by API the server) on the return path. That is the "standard mechanism" I referred to above. Some details in kubernetes/kubernetes#4874 @nikhiljindal , @derekwaynecarr , @caesarxuchao or others more familiar with API versioning should be able to clarify further if required. |
Comment by derekwaynecarr @quinton-hoole -- my takeaway, is that federation supports skew in that member clusters can be later versions, but federation must be <=. i see no other way we can provide any other level of support without losing data. |
Comment by quinton-hoole @derekwaynecarr in response to your comment/question above : Yes, I agree, as general guidance that might be sensible, although I suspect that the situation is a little more nuanced than that in general, e.g.
Q |
Comment by quinton-hoole @derekwaynecarr @marun One potentially useful optional feature that we could consider supporting would be preventing (in the FCP) federation of later versioned objects into earlier versioned clusters. So for example we could support a flag on the scheduler (or an optional field/annotation on each object) specifying whether e.g. v1.6 ReplicaSets should be scheduled/federated to earlier versioned clusters or not. That way users and/or administrators could decide whether they prefer stricter API semantics or more scheduling flexibility. This would be morally equivalent (using the example above) to allowing users/administrators to choose between failing API calls to create e.g. LoadBalancer Services in clusters that do not have external load balancers. That's not currently supported in Kubernetes today either. |
Comment by marun @quinton-hoole I think you may be confusing 'kube version' and 'api version'.
So kube 1.6 can add a new optional field to a I certainly agree that finer grained api versioning would be useful, but it looks like either federation is going to have to track changes across kube versions manually or forgo the possibility of allowing backwards compatible federation. |
Comment by marun As per @smarterclayton's recent comment on the ml thread, it may be possible to automate detection of field changes across kube versions either by diff'ing the fields sent vs the fields returned for a given request or diff'ing swagger docs. |
Comment by quinton-hoole From: Clayton Coleman ccoleman@redhat.com Fine grained version indications have been proposed, but have a long What are the concrete affordances we could add today? Certainly |
Comment by quinton-hoole From: Maru Newby marun@redhat.com Both options sound promising. Is there any existing machinery that The approach of only comparing fields actually returned in a request |
Comment by quinton-hoole ---------- Forwarded message ----------
Probably that you can do it up front and potentially prevent end users On the other hand, clusters are going to be upgraded underneath you, |
Comment by quinton-hoole An alternative idea I had was to do the version conversion automatically in the typed clientset, e.g. import clientset_v1_5 "k8s.io/kubernetes/pkg/client/clientset_generated/release_1_5" Then, when a version 1.5 cluster is being synced, the FCP uses the object fetched from the FCP API via the above clientset to compare against the object retrieved from that cluster, so that the object versions match and the comparison is automatically correct. Of course this means retrieving (and/or caching) one object per cluster version from the FCP API (in the FCP controller, in the FederatedInformer). But given that the number of cluster versions in a given federation should be relatively small (~3), that overhead (both in terms of CPU and cache memory) should be negligible. And it also automatically addresses @smarterclayton 's concern above regarding clusters being upgraded under the FCP (as these would be automagically reregistered as their newer versions, and the above caching would adapt automatically). |
Comment by quinton-hoole I updated the title. Let me know if you disagree @marun |
Comment by marun
I'm not sure what you mean since the discussion was about propagating resources to member clusters. How would user submission come into play?
The 'per-request' approach could presumably be simplified to cache field differences for a cluster after the first update had been performed against that cluster for a given type. The cache could be invalidated if the watch for the cluster for that type was interrupted for any reason to ensure the differences would be recomputed on the next update. That would cover the case of cluster upgrade. And it would be relatively cheap - if field differences existed between the federation control plane and a cluster, a single (potentially unnecessary) update would be enough to get field differences to cache. |
Comment by marun @quinton-hoole Are you suggesting having an informer for each of a fixed set of versions for any watched federated type? Or have the set of versions be configurable? Or determined dynamically from the observed versions of member clusters? How would the version of member clusters be determined? Is there something at the client layer that can provide that detail? Is there a potential for client version skew? Kube ensures backwards compatibility of the api by requiring added fields to be optional. Is there any policy around field addition in minor version releases? For example, if the federation control plane was upgraded to 1.6.1 which included a field addition, how would the control plane be able to compare resources against a member cluster still running 1.6.0? |
Comment by lavalamp (copy from email thread) This is an interesting problem. Note that our go client has the exact same issue, where it drops fields it fails to understand. I strongly recommend you do NOT do member-by-member comparison to check for equality, that is very brittle, even non-semantic formatting changes with the same wire format can break it (happened to daemonset, e.g. 0 length vs nil slices). One approach is to record the metadata.generation returned after a POST or PUT, and assume you do not need to sync again unless (a) the source object changes or (b) the destination generation changes. (a) itself should be detected by changes to the source object's metadata.generation. In fact I'm not sure that there are any reasonable alternatives to this approach. |
Comment by bgrant0607 We have similar problems even within a single cluster, particularly due to apiserver-kubelet skew: #4855. |
Comment by lavalamp
We shouldn't be adding fields in patch releases like that, but if we did, it would be extremely difficult to handle with this proposal, since you'd have to link in 1.6.0 client AND a 1.6.1 client. For single clusters, the update rule is that you have to update the control plane before the nodes, and you have to update apiserver before the rest of the control plane. We (back in the early days) decided that testing both back AND forwards compat was too much. I think federation needs a similar rule. Maybe 0 forward versions on delegate clusters isn't the right rule, but you at least need to bound this & test it. |
Comment by bgrant0607 Hmm. The appropriate sig-api-machinery teams don't exist, so I can't notify them. Before any new API mechanism is developed, it should be discussed in SIG API machinery. You may be interested in the field gating proposal: https://docs.google.com/document/d/1wuoSqHkeT51mQQ7dIFhUKrdi3-1wbKrNWeIL4cKb9zU/edit# |
Comment by bgrant0607 @lavalamp FYI, when I last checked, not all resources set metadata.generation. |
Comment by bgrant0607 @janetkuo, since it's similar-ish to the Deployment<->ReplicaSet sync'ing problem. |
Comment by czahedi I am making the API Machinery email lists, GitHub users, etc. now. |
Comment by bgrant0607 FYI, metadata.generation is not set for every resource yet. #7328 |
Comment by nikhiljindal From @quinton-hoole
Yes I agree. Ideally, users should be able indicate what should happen if FCP cannot guarantee all the fields of a replicaset in all clusters. Is it fine for FCP to do best effort (i.e fine to create a ReplicaSet with dropped fields if some clusters do not support a field set by the user (which is what we do today)) or should it generate an error. |
Comment by nikhiljindal From @marun @smarterclayton
I assume @smarterclayton meant that with swagger spec FCP can know up front what all fields are supported by each cluster. So if user submits a Replicaset with a field that one of the cluster does not support, then potentially FCP can reject that Replicaset before even trying to propogate it to underlying clusters.
This can be solved by running the "figure out supported fields using swagger spec" logic every time cluster goes from Ready to NotReady. Or we can require admins to unjoin and join cluster again if it is upgraded/downgraded.
For this to work, first update that tries to figure out what all fields are supported by underlying cluster will have to have all fields set (so that it can find out what fields were dropped). |
Comment by nikhiljindal In summary, we are discussing 4 issues here: 1. How to determine what fields are supported in each clusterProposed solutions: Use swagger spec or send an update request and compare request and response objects. 2. What to do if a field is not supported in a clusterGenerate an error or do best effort (create the object with dropped fields) 3. How to compare objects.Today we do a field by field comparison of desired state and current state of a resource in a cluster. 4. How much version skew can we support between FCP and clustersI think if we do all of the above i.e we are able to detect when a cluster does not support a field set by the user and we give user the choice of what to do in that case, then we would not have to limit FCP version to min of all cluster versions. Let me know if I missed anything. |
Comment by quinton-hoole @nikhiljindal Yup, I think you got most of it. Two minor comments: Re 4, And besides that, I don't think it's at all feasible to mandate that the FCP always gets updated before it's member clusters. Because members clusters are administered by different people/organizations than the FCP. So clusters can be upgraded or downgraded at any time. Clusters and their administrators might be completely unaware that their cluster has been added to a federation, and that is the right thing to do, given the intentional decoupling of the FCP and the clusters. Regarding 1 above, I also suggested an option of FCP controllers using a correctly versioned client library for interacting with each cluster, and the same versioned client library to interact with the FCP API. I'm not sure, but I suspect that this is morally equivalent to using the swagger spec. |
cc @marun |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Issue by marun
Thursday Apr 06, 2017 at 14:42 GMT
Originally opened as kubernetes/kubernetes#44160
As per a thread on the mailing list, it appears that federation will only be possible between a cluster whose API version is >= the API version of the federation control plane. This suggests that the control plane should be explicitly checking the API version indicated for a member cluster and refusing to federate to a cluster with an older API than is in use by the control plane.
cc: @kubernetes/sig-federation-bugs @kubernetes/sig-api-machinery-misc
The text was updated successfully, but these errors were encountered: