Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding lock files for kubeconfig updating #28034

Merged
merged 1 commit into from
Jun 24, 2016

Conversation

krousey
Copy link
Contributor

@krousey krousey commented Jun 24, 2016

This is to prevent concurrent executions from corrupting the kubeconfig file.

Also, release note?
#23964

@krousey krousey added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. team/ux labels Jun 24, 2016
@lavalamp lavalamp added this to the v1.3 milestone Jun 24, 2016
@@ -371,6 +371,12 @@ func Load(data []byte) (*clientcmdapi.Config, error) {
// WriteToFile serializes the config to yaml and writes it out to a file. If not present, it creates the file with the mode 0600. If it is present
// it stomps the contents
func WriteToFile(config clientcmdapi.Config, filename string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only place this file is written?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjcullen Told me so.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jun 24, 2016
@lavalamp lavalamp added release-note Denotes a PR that will be considered when it comes time to generate release notes. retest-not-required and removed release-note-label-needed labels Jun 24, 2016
@j3ffml
Copy link
Contributor

j3ffml commented Jun 24, 2016

Looks reasonable to me. Definitely mention in relase note.

@j3ffml
Copy link
Contributor

j3ffml commented Jun 24, 2016

cc @deads2k

@lavalamp
Copy link
Member

LGTM

@lavalamp lavalamp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2016
@k8s-bot
Copy link

k8s-bot commented Jun 24, 2016

GCE e2e build/test passed for commit 2a72e4f7afdd2abaf2a7b3957c2cef7d2da08d9d.

func lockFile(filename string) error {
// TODO: find a way to do this with actual file locks. Will
// probably need seperate solution for windows and linux.
f, err := os.OpenFile(lockName(filename), os.O_CREATE|os.O_EXCL, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks. Looks like other folks are using windows API directly - https://github.com/pinterest/knox/blob/master/client/flock_windows.go

@krousey Thanks for the explanation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dims I would definitely consider that for a later release, but I didn't want to try to rush something that hairy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krousey yes of course.

@lavalamp
Copy link
Member

Unit test failure might be for real.

@krousey krousey force-pushed the kubeconfig_filelock branch from 2a72e4f to 88e2a31 Compare June 24, 2016 20:12
@krousey
Copy link
Contributor Author

krousey commented Jun 24, 2016

@lavalamp It kind of was. test-cmd.sh was running with kubeconfig in a non-existent tmpdir. I moved the file locking code to below the code that takes care of making sure the directory tree exists. Please take a look.

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2016
@k8s-bot
Copy link

k8s-bot commented Jun 24, 2016

GCE e2e build/test passed for commit 88e2a31.

@lavalamp
Copy link
Member

lgtm again

@lavalamp lavalamp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2016
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit cd422ad into kubernetes:master Jun 24, 2016
@erictune
Copy link
Member

This requires all editors to know about this lock file, and to remember to remove the lock file, and if something crashes for the user to know how to delete the lock file.

Another solution would be for anything that writes a kubectl to write it atomically, by writing the entire thing to a temp file, and then renaming the temp file over the original. That way it either suceeds or fails regardless of what other writers do.

At any rate, I don't understand why this is a P0, so I am removing the priority and the cherrypick-candidate labels.

@erictune erictune removed cherrypick-candidate priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 24, 2016
@lavalamp
Copy link
Member

@erictune This is breaking the gke-serial test, and any GKE user who runs multiple instances of kubectl.

I agree the write & rename approach is nicer but do we need to wait for that? gke-serial won't be green without some sort of fix.

@girishkalele
Copy link

The latest e2e-gke run failed with a large stream of errors like this. Could it be the corrupted gke cert issue ?

Error: Get https://104.197.35.156/api/v1/namespaces/e2e-tests-volume-provisioning-b5o4k/pods/pvc-volume-tester-fvqh3: x509: certificate signed by unknown authority

Full logs:
http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-e2e-gke/9850/consoleFull

@alex-mohr alex-mohr added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. cherrypick-candidate labels Jun 25, 2016
@alex-mohr
Copy link
Contributor

re-flagging with priority bits and cherrypick-candidate so @erictune revisits per @lavalamp 's update to #23964

@alex-mohr
Copy link
Contributor

but fwiw, agree with his concerns re: what if kubectl crashes while lock file exists -- doesn't seem discoverable by users which lockfile to remove to clean it up. still, I think we're better off with this vs. without, though perhaps we can get something better in for 1.3.0 or 1.3.1?

@erictune
Copy link
Member

Multiple concurrent kubectl set in the same directory does not seem common at all.

@krousey
Copy link
Contributor Author

krousey commented Jun 25, 2016

@erictune @cjcullen says this was triggered by automatic IAM token refreshes, not explicit kubectl set calls. Hence why it was only showing up on GKE e2e tests.

@krousey
Copy link
Contributor Author

krousey commented Jun 25, 2016

@erictune Agreed that relinking would be better. It was an option, but we wanted to lock the config file itself. When there wasn't an easy cross-platform way to do that, we settled on a lock file. We should have gone back to re-linking.

If I can make that change, would it be cherry-picked?

@lavalamp
Copy link
Member

@erictune, as mentioned, it's not explicit writes, it's token refreshes,
which happen every 5 minutes or so.

On Sat, Jun 25, 2016 at 8:38 AM, krousey notifications@github.com wrote:

@erictune https://github.com/erictune Agreed that relinking would be
better. It was an option, but we wanted to lock the config file itself.
When there wasn't an easy cross-platform way to do that, we settled on a
lock file. We should have gone back to re-linking.

If I can make that change, would it be cherry-picked?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28034 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAngln8kZ0gwmNuwHeNsU7VK_-NbYLiuks5qPUttgaJpZM4I9_5v
.

@erictune
Copy link
Member

Hmm. So, if a user does a kubectl set at the same time as a token refresh is happening, then they will lose their edits with the re-linking approach? Maybe re-linking is not the best solution.

@erictune erictune added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate and removed cherrypick-candidate labels Jun 27, 2016
@eparis eparis mentioned this pull request Jun 27, 2016
@erictune
Copy link
Member

Following up in #28072

eparis pushed a commit to eparis/kubernetes that referenced this pull request Jun 29, 2016
Automatic merge from submit-queue

Adding lock files for kubeconfig updating

This is to prevent concurrent executions from corrupting the kubeconfig file.

Also, release note?

kubernetes#23964

(cherry picked from commit cd422ad)
k8s-github-robot pushed a commit that referenced this pull request Jun 29, 2016
Automatic merge from submit-queue

Batch update for 1.3

#28030: Revert "Federation e2e supports aws"
#28026: Address outstanding review comments in #27999.
#28034: Adding lock files for kubeconfig updating
#28004: return nil from NewClientConfig instead of empty struct
#28032: Increase pod CPU/memory for fluentd, dns and kube-proxy.
#27208: Bump minimum API version for docker to 1.21
#28061: Remove extra double quotes in --federations.
#28060: rkt: Fix the 'privileged' check when stage1 annotation is provided.
#27996: Image GC logic should compensate for reserved blocks
#28044: rkt: Bump required rkt version to 1.9.1.
#28040: Tracked addition of federation, sed support in kube DNS
#28043: Set grace period to 0 when deleting namespaces after the test.
#28002: Fix startup type error in initializeCaches
#28087: Hotfix: Fixup the hyperkube dns manifest from a breaking federation PR
#28108: Fix initialization of volume controller caches.
#28056: Increase kube-dns requirements on CoreOS.
#28147: Fix error checks after cloning.
#28159: Use : as seccomp security option operator for Docker 1.10
#28165: Refactored, expanded and fixed federated-services e2e tests.
#28095: Kubelet should mark VolumeInUse before checking if it is Attached
#28172: Build: Add KUBE_GCS_RELEASE_BUCKET_MIRROR option to push-ci-build.sh
#28207: Bump cluster autoscaler to 0.2.2
#28160: Volume manager must verify containers terminated before deleting for ungracefully terminated pods
#28211: Fix federation e2e tests by correctly managing cluster clients
#27944: Fix pvc label selector validation error
#28186: federation: Upgrading the groupversion to v1beta1
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

rhcarvalho added a commit to rhcarvalho/openshift-devtools that referenced this pull request Aug 15, 2016
A PR merged in June 24 [1] introduced the use of lock files when
updating the cluster configuration.
We need to make the KUBECONFIG dir writable for development, otherwise
commands such as "oc project <name>" fail, for they cannot write the
lock file and update the config.

[1] kubernetes/kubernetes#28034
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.