-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not query the metadata server to find out if running on GCE. Retry metadata server query for gcr if running on gce. #28871
Conversation
cc @Random-Liu |
Was this related to the PR @derekwaynecarr had opened? |
value, err := credentialprovider.ReadUrl(metadataScopes+"?alt=json", g.Client, metadataHeader) | ||
if err == nil { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a hot loop? Backing off or sleeping seems better.
Does this remove the need for #28539, or is the timeout still a concern? |
Not sure, I think that was mostly to avoid hangs in non-GCE envs, so this might prevent that entirely? |
for { | ||
value, err = credentialprovider.ReadUrl(metadataScopes+"?alt=json", g.Client, metadataHeader) | ||
if err == nil { | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think my previous comment was addressed: this shouldn't run in such a hot loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't ReadUrl
have an in-built timeout? I don't think this logic will busy wait. I can add a delay if it will be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if there is a timeout, trying immediately may not be good. If the metadata server is overwhelmed by requests already, adding a delay will help.
Failed on the containervm image in the node e2e tests.
I didn't dig into failure, but FWIW, I checked and container registry registration always fails even in a green node e2e run. |
can you squash commits? |
This looks fine, but I'm heading out soon. Squash and I'll defer to @roberthbailey for final. |
|
I guess we don't have service account on the node e2e vm instances. How about checking http://metadata.google.internal./computeMetadata/v1/instance/ first to see if service-account is there? |
I guess It looks like the gcr credential provider is never enabled in e2e test before, because the metadata server is never reachable. Here is the log from the node e2e kubelet log:
However, now we change to use file based check, and it always passes, then kubelet keeps failing to access metadata server after that. |
@yujuhong - if this looks good to you can you apply the label? |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Bumping up the priority since this is required for v1.3 |
GCE e2e build/test passed for commit ea1a459. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
@k8s-bot test this github issue #IGNORE |
GCE e2e build/test failed for commit ea1a459. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
@k8s-bot test this github issue #IGNORE |
GCE e2e build/test failed for commit ea1a459. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
GCE e2e build/test passed for commit ea1a459. |
@k8s-bot test this github issue #IGNORE |
GCE e2e build/test passed for commit ea1a459. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit ea1a459. |
Automatic merge from submit-queue |
Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
…of-#28871-upstream-release-1.3 Automatic merge from submit-queue Automated cherry pick of kubernetes#28871 Cherry pick of kubernetes#28871 on release-1.3.
Retry the logic for determining is gcr is enabled to workaround metadata unavailability.
Note: This patch does not retry fetching registry credentials.