-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding lock files for kubeconfig updating #28034
Adding lock files for kubeconfig updating #28034
Conversation
@@ -371,6 +371,12 @@ func Load(data []byte) (*clientcmdapi.Config, error) { | |||
// WriteToFile serializes the config to yaml and writes it out to a file. If not present, it creates the file with the mode 0600. If it is present | |||
// it stomps the contents | |||
func WriteToFile(config clientcmdapi.Config, filename string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only place this file is written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjcullen Told me so.
Looks reasonable to me. Definitely mention in relase note. |
cc @deads2k |
LGTM |
GCE e2e build/test passed for commit 2a72e4f7afdd2abaf2a7b3957c2cef7d2da08d9d. |
func lockFile(filename string) error { | ||
// TODO: find a way to do this with actual file locks. Will | ||
// probably need seperate solution for windows and linux. | ||
f, err := os.OpenFile(lockName(filename), os.O_CREATE|os.O_EXCL, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if we don't care about windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks. Looks like other folks are using windows API directly - https://github.com/pinterest/knox/blob/master/client/flock_windows.go
@krousey Thanks for the explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dims I would definitely consider that for a later release, but I didn't want to try to rush something that hairy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krousey yes of course.
Unit test failure might be for real. |
2a72e4f
to
88e2a31
Compare
@lavalamp It kind of was. test-cmd.sh was running with kubeconfig in a non-existent tmpdir. I moved the file locking code to below the code that takes care of making sure the directory tree exists. Please take a look. |
GCE e2e build/test passed for commit 88e2a31. |
lgtm again |
Automatic merge from submit-queue |
This requires all editors to know about this lock file, and to remember to remove the lock file, and if something crashes for the user to know how to delete the lock file. Another solution would be for anything that writes a kubectl to write it atomically, by writing the entire thing to a temp file, and then renaming the temp file over the original. That way it either suceeds or fails regardless of what other writers do. At any rate, I don't understand why this is a P0, so I am removing the priority and the cherrypick-candidate labels. |
@erictune This is breaking the gke-serial test, and any GKE user who runs multiple instances of kubectl. I agree the write & rename approach is nicer but do we need to wait for that? gke-serial won't be green without some sort of fix. |
The latest e2e-gke run failed with a large stream of errors like this. Could it be the corrupted gke cert issue ?
Full logs: |
but fwiw, agree with his concerns re: what if kubectl crashes while lock file exists -- doesn't seem discoverable by users which lockfile to remove to clean it up. still, I think we're better off with this vs. without, though perhaps we can get something better in for 1.3.0 or 1.3.1? |
Multiple concurrent |
@erictune Agreed that relinking would be better. It was an option, but we wanted to lock the config file itself. When there wasn't an easy cross-platform way to do that, we settled on a lock file. We should have gone back to re-linking. If I can make that change, would it be cherry-picked? |
@erictune, as mentioned, it's not explicit writes, it's token refreshes, On Sat, Jun 25, 2016 at 8:38 AM, krousey notifications@github.com wrote:
|
Hmm. So, if a user does a kubectl set at the same time as a token refresh is happening, then they will lose their edits with the re-linking approach? Maybe re-linking is not the best solution. |
Following up in #28072 |
Automatic merge from submit-queue Adding lock files for kubeconfig updating This is to prevent concurrent executions from corrupting the kubeconfig file. Also, release note? kubernetes#23964 (cherry picked from commit cd422ad)
Automatic merge from submit-queue Batch update for 1.3 #28030: Revert "Federation e2e supports aws" #28026: Address outstanding review comments in #27999. #28034: Adding lock files for kubeconfig updating #28004: return nil from NewClientConfig instead of empty struct #28032: Increase pod CPU/memory for fluentd, dns and kube-proxy. #27208: Bump minimum API version for docker to 1.21 #28061: Remove extra double quotes in --federations. #28060: rkt: Fix the 'privileged' check when stage1 annotation is provided. #27996: Image GC logic should compensate for reserved blocks #28044: rkt: Bump required rkt version to 1.9.1. #28040: Tracked addition of federation, sed support in kube DNS #28043: Set grace period to 0 when deleting namespaces after the test. #28002: Fix startup type error in initializeCaches #28087: Hotfix: Fixup the hyperkube dns manifest from a breaking federation PR #28108: Fix initialization of volume controller caches. #28056: Increase kube-dns requirements on CoreOS. #28147: Fix error checks after cloning. #28159: Use : as seccomp security option operator for Docker 1.10 #28165: Refactored, expanded and fixed federated-services e2e tests. #28095: Kubelet should mark VolumeInUse before checking if it is Attached #28172: Build: Add KUBE_GCS_RELEASE_BUCKET_MIRROR option to push-ci-build.sh #28207: Bump cluster autoscaler to 0.2.2 #28160: Volume manager must verify containers terminated before deleting for ungracefully terminated pods #28211: Fix federation e2e tests by correctly managing cluster clients #27944: Fix pvc label selector validation error #28186: federation: Upgrading the groupversion to v1beta1
Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
A PR merged in June 24 [1] introduced the use of lock files when updating the cluster configuration. We need to make the KUBECONFIG dir writable for development, otherwise commands such as "oc project <name>" fail, for they cannot write the lock file and update the config. [1] kubernetes/kubernetes#28034
This is to prevent concurrent executions from corrupting the kubeconfig file.
Also, release note?
#23964