Fix DownwardAPI refresh race. #59873

jsafrane · 2018-02-14T16:11:23Z

WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP) and DSWP should mark the volume to be remounted only when the new pod has been processed.

Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler, then DownwardAPI and Projected volumes of the pod are not refreshed with new content and they are updated after the next periodic sync (60-90 seconds).

Fixes #59813

/assign @jingxu97 @saad-ali
/sig storage
/sig node

None

WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP) and DSWP should mark the volume to be remounted only when the new pod has been processed. Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler, then DownwardAPI and Projected volumes of the pod are not refreshed with new content and they are updated after the next periodic sync (60-90 seconds).

errordeveloper · 2018-02-14T16:13:13Z

/lgtm
/ok-to-test

jsafrane · 2018-02-14T16:49:57Z

/retest

jingxu97 · 2018-02-14T23:50:48Z

/lgtm

jsafrane · 2018-02-15T08:42:14Z

/assign @derekwaynecarr
for approval. This should fix lot of flakes we see in DownwardAPI tests.

derekwaynecarr

Awesome to have a fix.

/lgtm
/approve

k8s-ci-robot · 2018-02-16T00:07:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, errordeveloper, jingxu97, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/volumemanager/OWNERS~~ [derekwaynecarr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-github-robot · 2018-02-16T03:45:57Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2018-02-16T04:16:32Z

Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions here.

@jsafrane

Automatic merge from submit-queue. Picks for volume manager Thanks to @jsafrane for these fixes kubernetes/kubernetes#59873 kubernetes/kubernetes#59923 59923 modified from upstream because some logging levels where already higher in 1.9 xref https://bugzilla.redhat.com/show_bug.cgi?id=1538216 Fixes #17605 Fixes #17556 @derekwaynecarr

@jsafrane

Automatic merge from submit-queue. Picks for volume manager Thanks to @jsafrane for these fixes kubernetes#59873 kubernetes#59923 59923 modified from upstream because some logging levels where already higher in 1.9 xref https://bugzilla.redhat.com/show_bug.cgi?id=1538216 Fixes openshift/origin#17605 Fixes openshift/origin#17556 @derekwaynecarr Origin-commit: e4f2115102c01124cc7f168b7f1ae4c65f190875

liggitt · 2018-03-12T23:20:06Z

pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go

@@ -302,6 +305,9 @@ func (dswp *desiredStateOfWorldPopulator) processPodVolumes(pod *v1.Pod) {
 	// some of the volume additions may have failed, should not mark this pod as fully processed
 	if allVolumesAdded {
 		dswp.markPodProcessed(uniquePodName)
+		// New pod has been synced. Re-mount all volumes that need it
+		// (e.g. DownwardAPI)
+		dswp.actualStateOfWorld.MarkRemountRequired(uniquePodName)


this commit showed up in a bisect of scalability test regression of pod startup time (#60589 (comment)), and this change looks odd... does this force double mount setup for all pods?

The flow that leads to MarkRemountRequired call is:

kubelet.syncPod is called (on pod update or every 90s)

volumeManager.WaitForAttachAndMount is called and marks the pod for reprocessing

desiredStateOfWorldPopulator.findAndAddNewPods is called periodically

desiredStateOfWorldPopulator.processPodVolumes is called (only when the pod was marked for reprocessing in syncPod)

actualStateOfWorld.MarkRemountRequired is called

reconciler re-mounts the volumes (updates secrets, downward API, ...). Update of DownwardAPI is the reason why syncPod is involved.

Notice that before this PR, MarkRemountRequired was called directly by kubelet.syncPod, so the frequency of the calls did not change. With this PR, they are called only in the right order. I don't think it could affect pod startup time.

On the other hand, syncPod is called with every pod update and pod changes quite often during startup. It is possible that I missed something there.

WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP) and DSWP should mark the volume to be remounted only when the new pod has been processed. Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler, then DownwardAPI and Projected volumes of the pod are not refreshed with new content and they are updated after the next periodic sync (60-90 seconds). RelatedTo: kubernetes#59873

k8s-ci-robot assigned jingxu97 and saad-ali Feb 14, 2018

k8s-ci-robot requested review from davidz627 and verult February 14, 2018 16:11

k8s-ci-robot assigned errordeveloper Feb 14, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 14, 2018

jsafrane mentioned this pull request Feb 15, 2018

High flake rates on Downward API volume update e2e tests #59813

Closed

k8s-ci-robot assigned derekwaynecarr Feb 15, 2018

derekwaynecarr approved these changes Feb 16, 2018

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 16, 2018

k8s-github-robot merged commit c7c5d89 into kubernetes:master Feb 16, 2018

sjenning mentioned this pull request Feb 16, 2018

Picks for volume manager openshift/origin#18636

Merged

liggitt reviewed Mar 12, 2018

View reviewed changes

jpeeler mentioned this pull request Mar 19, 2018

Extended.[k8s.io] Projected should be consumable from pods in volume with mappings [Conformance] [Volume] openshift/origin#14176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DownwardAPI refresh race. #59873

Fix DownwardAPI refresh race. #59873

jsafrane commented Feb 14, 2018

errordeveloper commented Feb 14, 2018

jsafrane commented Feb 14, 2018

jingxu97 commented Feb 14, 2018

jsafrane commented Feb 15, 2018

derekwaynecarr left a comment

k8s-ci-robot commented Feb 16, 2018

k8s-github-robot commented Feb 16, 2018

k8s-github-robot commented Feb 16, 2018

liggitt Mar 12, 2018

jsafrane Mar 14, 2018

Fix DownwardAPI refresh race. #59873

Fix DownwardAPI refresh race. #59873

Conversation

jsafrane commented Feb 14, 2018

errordeveloper commented Feb 14, 2018

jsafrane commented Feb 14, 2018

jingxu97 commented Feb 14, 2018

jsafrane commented Feb 15, 2018

derekwaynecarr left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 16, 2018

k8s-github-robot commented Feb 16, 2018

k8s-github-robot commented Feb 16, 2018

liggitt Mar 12, 2018

Choose a reason for hiding this comment

jsafrane Mar 14, 2018

Choose a reason for hiding this comment