-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error syncing deployment, replica set already exists #29735
Comments
@gigaroby could you provide more detail info? like what's your resource file and how to reproduce it. |
@adohe yes, of course:
The I am not sure about how to narrow this down further, any suggestions? |
@kubernetes/deployment can somebody triage this? |
We just started seeing this as well. kube 1.3.6 on AWS. The cluster was running fine for two weeks, then the problem started on the last CI build yesterday, and shows no signs of fixing itself. Our deploy process is a series of |
#28684 should fix the hotloop by using a rate limiter in the deployment controller. Seems like it was included in 1.3.7. Also the "already exists" shouldn't be treated as an error but instead we should return the replica set from the cache at this point (since it already exists). I was handling this error in a recent PR of mine, will update this issue back with the # |
@Kargakis Are you sure the fix is in 1.3.7? I just hit this problem in a build of 1.3.7, and it kills the master node as it pegs a cpu at 100% in this error loop. edit: it doesn't seem to be on the release-1.3 branch, but is in release-1.4 |
Hm, git fooled me. The fix is included in 1.4. |
From the code, it looks like deployment api generates an adler-32 checksum of the pod template, then uses that to create a new replicaset |
@jsravn I don't think you would get a hash collision even with >100 deploys (not sure where Adler starts to fail though). Even the slightest change in the pod template changes the hash. Can you provide the pod template of the existing replica set that blocks the deployment controller and also the latest state of your deployment? I think this issue is simply us not handling AlreadyExists errors when we try to create new replica sets (the replica set cache may get stale). |
@Kargakis sure... I admit I may be wrong - that was my just my super quick analysis. :) Although, adler is pretty terrible as a hash function, and I was a little surprised to see it used - which is why I leapt to that conclusion. I need to try reproducing to see if it's actually the case though - which I'll try now. Here's the replicaset that it complains already exists: apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
annotations:
deployment.kubernetes.io/revision: "25"
creationTimestamp: 2016-09-22T16:00:52Z
generation: 3
labels:
app: cats
pod-template-hash: "218308181"
name: cats-218308181
namespace: cats-stubbed-functional
resourceVersion: "3958038"
selfLink: /apis/extensions/v1beta1/namespaces/cats-stubbed-functional/replicasets/cats-218308181
uid: bcd1bacb-80dd-11e6-bfcc-0a8cd5340591
spec:
replicas: 0
selector:
matchLabels:
app: cats
pod-template-hash: "218308181"
template:
metadata:
creationTimestamp: null
labels:
app: cats
pod-template-hash: "218308181"
spec:
containers:
- env:
- name: DEPLOYMENT_ENVIRONMENT
value: cats-stubbed-functional
- name: APP_NAME
value: cats
image: registry**obfuscated**/test/cats:v0.535.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /private/status
port: 9077
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: cats
ports:
- containerPort: 9077
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /private/status
port: 9077
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
observedGeneration: 3
replicas: 0 And the current failing deployment (although, I believe it started failing at generation 204, version 0.612.0). apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "100"
kubectl.kubernetes.io/last-applied-configuration: '{"kind":"Deployment","apiVersion":"extensions/v1beta1","metadata":{"name":"cats","creationTimestamp":null,"labels":{"app":"cats"}},"spec":{"replicas":1,"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"cats"}},"spec":{"containers":[{"name":"cats","image":"registry**obfuscated**/test/cats:v0.617.0","ports":[{"name":"http","containerPort":9077,"protocol":"TCP"}],"env":[{"name":"DEPLOYMENT_ENVIRONMENT","value":"cats-stubbed-functional"},{"name":"APP_NAME","value":"cats"}],"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"1","memory":"1Gi"}},"livenessProbe":{"httpGet":{"path":"/private/status","port":9077},"initialDelaySeconds":30,"timeoutSeconds":1},"readinessProbe":{"httpGet":{"path":"/private/status","port":9077},"initialDelaySeconds":1,"timeoutSeconds":1}}]}},"strategy":{"type":"RollingUpdate","rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}},"status":{}}'
creationTimestamp: 2016-09-11T09:39:41Z
generation: 209
labels:
app: cats
name: cats
namespace: cats-stubbed-functional
resourceVersion: "12695928"
selfLink: /apis/extensions/v1beta1/namespaces/cats-stubbed-functional/deployments/cats
uid: a9ceb758-7803-11e6-9e34-0a8cd5340591
spec:
replicas: 1
rollbackTo: {}
selector:
matchLabels:
app: cats
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: cats
spec:
containers:
- env:
- name: DEPLOYMENT_ENVIRONMENT
value: cats-stubbed-functional
- name: APP_NAME
value: cats
image: registry**obfuscated**/test/cats:v0.617.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /private/status
port: 9077
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: cats
ports:
- containerPort: 9077
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /private/status
port: 9077
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
observedGeneration: 203
replicas: 1
updatedReplicas: 1 |
Hash collisions are definitely a problem. I wrote up a test and can get a collision within a couple hundred versions (only version changes). Here's pod_test.go:
I'm guessing we can't improve the hash function at this point, to maintain backwards compatibility? We should at least handle the collision somehow - what do you suggest? |
Further, in the example I gave above, the hash output only has about 200 unique values, after which all newer versions collide with previous hashed values.
|
I suppose the immediate workaround for users is to limit deployment revision history to small values (http://kubernetes.io/docs/user-guide/deployments/#revision-history-limit). |
@jsravn thanks for the test! I run it as well and I see the same results. @smarterclayton @bgrant0607 can you have a look? An immediate workaround is to use revisionHistoryLimit, yes. In reality most users won't care about the history of 200 replica sets but it is concerning that Adler breaks that fast. |
Adler was only used because we were originally overly concerned about the hash performance. We need to be using a reasonably (but not excessively) fast hash where collision chance is >> 1 in 1e5, and preferably > 1e10 or more. |
fnv would be an easy change |
Results from benchmarking
Fnv is a bit slower but much more stable. |
@smarterclayton so my understanding is that the migration of old replica sets will happen during the normal operation of the cluster, is that correct? We can migrate all replica sets with zero replicas in the background, and migrate running sets once a new rollout happens. We may not even need a separate queue. I am still not sure if we can notify admins somehow when all existing deployments have migrated. |
We could expose a metric that is "how many old replication controllers we still have" |
For 1.6, we moved Deployments into the apps api group[1] so we can change the defaults (revisionHistoryLimit is now set to 3 by default for newly created deployments) and we also made the cleanup policy run independent of a rollout (so a rollout that gets stuck won't block deletiion of older replica sets)[2]. I've also opened an update to the Deployment proposal for moving away from hashing: kubernetes/community#384 Moving the milestone to 1.7 |
Changing revision history doesn't seem to alleviate hash collisions, right? Looking at #43449 - are we using a poor hash for this? |
Actually, it does. If all you have is 3 old replica sets, it's less likely that the controller will break you because of a hash collision. 200 seems to be the limit with adler.
Yes. There is #38714 that changes us to fnv but there is also kubernetes/community#384 that moves us away from hashing. I am going to open an alternative proposal to 384 that covers the transition to fnv and we need to decide with what do we want to proceed. |
Opened kubernetes/community#477 as an alternative to kubernetes/community#384 |
Automatic merge from submit-queue Switch Deployments to new hashing algo w/ collision avoidance mechanism Implements kubernetes/community#477 @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews Fixes #29735 Fixes #43948 ```release-note Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore. ```
We saw this in a related configuration. Marking --record=true on the deployment [1] and setting the revision history [2] configuration seemed to alleviate the stress. Is there a status on the hash collision bug? [1] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ Thanks, |
@brugz the fix will be included in 1.7 release |
Automatic merge from submit-queue Switch Deployments to new hashing algo w/ collision avoidance mechanism Implements kubernetes/community#477 @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews Fixes kubernetes/kubernetes#29735 Fixes kubernetes/kubernetes#43948 ```release-note Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore. ```
Env: Kubernetes 1.3.3 on AWS
I ran into the same problem described into #26673 but now it's happening on every deploy.
I would have reopened the issue, but I was not the author so I could not.
The bug presents itself because I have a build that does
kubectl rollout status deployment/application-unstable
just after akubectl apply -f <new deployment manifest>
and it gets stuck forever because the rollout is never reported as finished (it does finish tho).The text was updated successfully, but these errors were encountered: