Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix expectations in Deployment. #21963

Merged
merged 1 commit into from
Feb 26, 2016
Merged

Conversation

bgrant0607
Copy link
Member

Working on a fix for #19299.

Probably still want eventual expiration, but if e2e doesn't pass without that, there's got to be a problem.

cc @janetkuo @Kargakis @mqliang @nikhiljindal @ironcladlou

@bgrant0607 bgrant0607 self-assigned this Feb 25, 2016
@bgrant0607 bgrant0607 added this to the v1.2 milestone Feb 25, 2016
@bgrant0607 bgrant0607 changed the title WIP: Fix expectations in Deployment. [WIP] Fix expectations in Deployment. Feb 25, 2016
@k8s-github-robot
Copy link

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 25, 2016
@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e test build/test passed for commit 70df516d5311d509d981a44b70a2ab51b207f547.

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e test build/test passed for commit 70d5ad43fd26d7631044f22fd52aec5ebf217b7b.

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e test build/test passed for commit 225a477805dad4b713162f94da93f21cb5f7f5ff.

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e build/test failed for commit e995d58286046e711a7065b67c1e88c7df58a7f6.

@bgrant0607
Copy link
Member Author

All tests passed except for:

Deployment
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:72
  paused deployment should be ignored by the controller [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:62

  Feb 25 02:07:21.882: Err : nginx
  . Failed to remove deployment &{{ } {/api/v1/namespaces/e2e-tests-deployment-t46ni/pods 2875} [{{ } {nginx-cifwz-j9rn2 nginx-cifwz- e2e-tests-deployment-t46ni /api/v1/namespaces/e2e-tests-deployment-t46ni/pods/nginx-cifwz-j9rn2 7aa22eb7-dba7-11e5-a631-42010af00002 1511 %!s(int64=0) 2016-02-25 02:06:46 -0800 PST <nil> %!s(*int64=<nil>) map[name:nginx pod-template-hash:1900829565] map[kubernetes.io/created-by:{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"e2e-tests-deployment-t46ni","name":"nginx-cifwz","uid":"7a62371a-dba7-11e5-a631-42010af00002","apiVersion":"extensions","resourceVersion":"1158"}}
  ]} {[{default-token-919tm {%!s(*api.HostPathVolumeSource=<nil>) %!s(*api.EmptyDirVolumeSource=<nil>) %!s(*api.GCEPersistentDiskVolumeSource=<nil>) %!s(*api.AWSElasticBlockStoreVolumeSource=<nil>) %!s(*api.GitRepoVolumeSource=<nil>) %!s(*api.SecretVolumeSource=&{default-token-919tm}) %!s(*api.NFSVolumeSource=<nil>) %!s(*api.ISCSIVolumeSource=<nil>) %!s(*api.GlusterfsVolumeSource=<nil>) %!s(*api.PersistentVolumeClaimVolumeSource=<nil>) %!s(*api.RBDVolumeSource=<nil>) %!s(*api.FlexVolumeSource=<nil>) %!s(*api.CinderVolumeSource=<nil>) %!s(*api.CephFSVolumeSource=<nil>) %!s(*api.FlockerVolumeSource=<nil>) %!s(*api.DownwardAPIVolumeSource=<nil>) %!s(*api.FCVolumeSource=<nil>) %!s(*api.AzureFileVolumeSource=<nil>) %!s(*api.ConfigMapVolumeSource=<nil>)}}] [{nginx nginx [] []  [] [] {map[] map[]} [{default-token-919tm %!s(bool=true) /var/run/secrets/kubernetes.io/serviceaccount}] %!s(*api.Probe=<nil>) %!s(*api.Probe=<nil>) %!s(*api.Lifecycle=<nil>) /dev/termination-log Always %!s(*api.SecurityContext=<nil>) %!s(bool=false) %!s(bool=false) %!s(bool=false)}] Always %!s(*int64=0xc2084f6128) %!s(*int64=<nil>) ClusterFirst map[] default e2e-gce-master-0-minion-6zpd %!s(*api.PodSecurityContext=&{false false false <nil> <nil> <nil> [] <nil>}) []} {Running [{Ready True 0001-01-01 00:00:00 +0000 UTC 2016-02-25 02:06:51 -0800 PST  }]   10.240.0.6 10.245.4.11 2016-02-25 02:06:47 -0800 PST [{nginx {%!s(*api.ContainerStateWaiting=<nil>) %!s(*api.ContainerStateRunning=&{{{63591991609 0 0x2cb8520}}}) %!s(*api.ContainerStateTerminated=<nil>)} {%!s(*api.ContainerStateWaiting=<nil>) %!s(*api.ContainerStateRunning=<nil>) %!s(*api.ContainerStateTerminated=<nil>)} %!s(bool=true) %!s(int=0) nginx docker://ae8e1e9c54b3fcaed87ff6ded395690df8dbd2bb4e9842485ab36a89d491c6f9 docker://1d36039453eb391f71cb86cc08b7107b9105063734c23317299e8679c7d98f16}]}}]} pods : %!v(MISSING)

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:197

@bgrant0607
Copy link
Member Author

The paused test fails to delete all of the Deployment's pods at the very end.

@bgrant0607 bgrant0607 changed the title [WIP] Fix expectations in Deployment. Fix expectations in Deployment. Feb 25, 2016
@bgrant0607
Copy link
Member Author

Ok. I changed Deployment to do straightforward refcounting of pod additions and deletions. This made all deployment tests pass for me, except for pause.

I need someone else (or multiple someones) to look at pause.

And I would like someone to change the ExpirationCache to permit explicit timeout instead of expiration on Get. The timestamp should be reset every time syncDeployment is able to make progress (i.e., stores are synced and expectations are met) -- this should be when SetExpectations is called.

Without vectors of logical clocks, this is not going to be bulletproof, but it's at least not blatantly wrong.

@bgrant0607 bgrant0607 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/flake Categorizes issue or PR as related to a flaky test. labels Feb 25, 2016
@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e test build/test passed for commit 02565d84f4b0686424d21e089b18ee46d3c782a4.

dc.podExpectations.RaiseExpectations(dKey, newScale-rs.Spec.Replicas, 0)
} else {
scalingOperation = "down"
dc.podExpectations.LowerExpectations(dKey, 0, rs.Spec.Replicas-newScale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

dc.podExpectations.RaiseExpectations(dKey, 0, rs.Spec.Replicas-newScale)

or

dc.podExpectations.LowerExpectations(dKey, 0, newScale-rs.Spec.Replicas)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@madhusudancs
Copy link
Contributor

LGTM, other than one minor comment. Please feel free to apply the label after making that change.

@bgrant0607
Copy link
Member Author

Addressed feedback and did some more cleanup (conversion of Set/Delete to Raise/Lower). Enabled flaky tests in PR builder.

func (e *ControlleeExpectations) Seen(add, del int64) {
atomic.AddInt64(&e.add, -add)
atomic.AddInt64(&e.del, -del)
// Seen increments the add and del counters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the comment to Add

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e build/test failed for commit 3ecc59850ffd48cafbf5eb4b5d0bfe74d13e9d77.

Please reference the list of currently known flakes when examining this failure.

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e build/test failed for commit b90f631a6bb67ff84ab55e526d93e6e9499edca7.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@janetkuo
Copy link
Member

Tests failed due to timeout:

Kubernetes e2e suite.Deployment RollingUpdateDeployment should scale up and down in the right order
Kubernetes e2e suite.Deployment RecreateDeployment should delete old pods and create new ones
Kubernetes e2e suite.Deployment deployment should delete old replica sets
Kubernetes e2e suite.Deployment RollingUpdateDeployment should delete old pods and create new ones
Kubernetes e2e suite.Deployment deployment should support rollover

Can't find the controller manger log in the bucket.

@bgrant0607
Copy link
Member Author

@janetkuo Thanks. I'll take a look. Those are all the same tests that have been failing, so looks like there's still a problem.

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e build/test failed for commit 6fd1c532b634914a8884177e55ba87bb0f97ceba.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@janetkuo
Copy link
Member

Tests seem to be still failing and the kube-controller-manager.log is huge (13G) so I didn't try to open it. Tailing the file all I see is this repeating error message:

E0225 14:50:09.087732  124308 deployment_controller.go:408] Error syncing deployment e2e-tests-deployment-hwi9u/redis-deployment: pod expectations not met yet for e2e-tests-deployment-hwi9u/redis-deployment in syncDeployment

@bgrant0607
Copy link
Member Author

Added expiration back so I can (hopefully) see what expectations aren't being satisfied.

@k8s-bot
Copy link

k8s-bot commented Feb 26, 2016

GCE e2e build/test passed for commit c9a1ce66d07fc3b1445d5414fc2423951f1937bd.

@k8s-bot
Copy link

k8s-bot commented Feb 26, 2016

GCE e2e build/test passed for commit e1362e384a2c4874122e495e9d04e93095da2d29.

@bgrant0607
Copy link
Member Author

Controller manager log was huge because I was always re-enqueueing on failure, only to fail again.

Pods from ReplicaSets created with non-zero replicas weren't accounted for.

It also appears that not all expected deletions happen.

Could be faulty error handling.

I0226 00:35:40.222253       6 controller_utils.go:188] Raising expectations &{add:0 del:3 key:e2e-tests-deployment-y6wmn/redis-deployment-3 timestamp:{sec:63592043740 nsec:81518825 loc:0x263dd00}}
I0226 00:35:40.224626       6 controller_utils.go:179] Lowering expectations &{add:0 del:3 key:e2e-tests-deployment-y6wmn/redis-deployment-3 timestamp:{sec:63592043740 nsec:81518825 loc:0x263dd00}}
I0226 00:35:40.224663       6 deployment_controller.go:419] Finished syncing deployment "e2e-tests-deployment-y6wmn/redis-deployment-3" (143.246792ms)
E0226 00:35:40.224690       6 deployment_controller.go:408] Error syncing deployment e2e-tests-deployment-y6wmn/redis-deployment-3: replicasets "nginx-controller" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
I0226 00:35:40.224749       6 controller_utils.go:133] Controller still waiting on expectations &controller.ControlleeExpectations{add:0, del:3, key:"e2e-tests-deployment-y6wmn/redis-deployment-3", timestamp:time.Time{sec:63592043740, nsec:81518825, loc:(*time.Location)(0x263dd00)}}
I0226 00:35:40.224789       6 deployment_controller.go:419] Finished syncing deployment "e2e-tests-deployment-y6wmn/redis-deployment-3" (52.129µs)

@k8s-bot
Copy link

k8s-bot commented Feb 26, 2016

GCE e2e build/test passed for commit 31f95ae64381a6f33f526f4bb018312d3647ad01.

@bgrant0607
Copy link
Member Author

It appears to be pretty common for deletions to be observed after expectations are determined to be satisfied and are reset.

@k8s-bot
Copy link

k8s-bot commented Feb 26, 2016

GCE e2e build/test failed for commit 5837987cf7ed2ff918641bae499abe0ef5098539.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@k8s-bot
Copy link

k8s-bot commented Feb 26, 2016

GCE e2e build/test passed for commit 554eee3babb02e1c273e7a81898688c02a0b92a9.

@bgrant0607
Copy link
Member Author

Something fishy is going on with deletes, but I don't have time to investigate more. This seems to make all the tests pass.

PTAL

@bgrant0607
Copy link
Member Author

BTW, this also makes the controller manager log comparable in size to the other components.

@k8s-bot
Copy link

k8s-bot commented Feb 26, 2016

GCE e2e build/test passed for commit 39f0edc.

func (e *ControlleeExpectations) Seen(add, del int64) {
atomic.AddInt64(&e.add, -add)
atomic.AddInt64(&e.del, -del)
// Add increments the add and del counters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much more readable and easier to comprehend than the Seen() language. Thanks!

@madhusudancs
Copy link
Contributor

LGTM. I will let @janetkuo to add the label after taking a look.

@0xmichalis
Copy link
Contributor

LGTM

@bgrant0607 bgrant0607 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2016
@bgrant0607
Copy link
Member Author

Merging to fix tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants