Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not query the metadata server to find out if running on GCE. Retry metadata server query for gcr if running on gce. #28871

Merged
merged 2 commits into from
Jul 18, 2016

Conversation

vishh
Copy link
Contributor

@vishh vishh commented Jul 13, 2016

Retry the logic for determining is gcr is enabled to workaround metadata unavailability.

Note: This patch does not retry fetching registry credentials.

@vishh
Copy link
Contributor Author

vishh commented Jul 13, 2016

cc @Random-Liu

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Jul 13, 2016
@vishh
Copy link
Contributor Author

vishh commented Jul 13, 2016

cc @zmerlynn @roberthbailey

@vishh vishh assigned roberthbailey and zmerlynn and unassigned liggitt Jul 13, 2016
@liggitt
Copy link
Member

liggitt commented Jul 13, 2016

Was this related to the PR @derekwaynecarr had opened?

value, err := credentialprovider.ReadUrl(metadataScopes+"?alt=json", g.Client, metadataHeader)
if err == nil {
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a hot loop? Backing off or sleeping seems better.

@vishh
Copy link
Contributor Author

vishh commented Jul 13, 2016

@liggitt #28539 seems orthogonal. WDYT?

@liggitt
Copy link
Member

liggitt commented Jul 13, 2016

Does this remove the need for #28539, or is the timeout still a concern?

@liggitt
Copy link
Member

liggitt commented Jul 13, 2016

Not sure, I think that was mostly to avoid hangs in non-GCE envs, so this might prevent that entirely?

@Random-Liu Random-Liu added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jul 13, 2016
@Random-Liu Random-Liu added this to the v1.3 milestone Jul 13, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 13, 2016
for {
value, err = credentialprovider.ReadUrl(metadataScopes+"?alt=json", g.Client, metadataHeader)
if err == nil {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think my previous comment was addressed: this shouldn't run in such a hot loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't ReadUrl have an in-built timeout? I don't think this logic will busy wait. I can add a delay if it will be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there is a timeout, trying immediately may not be good. If the metadata server is overwhelmed by requests already, adding a delay will help.

@yujuhong
Copy link
Contributor

Failed on the containervm image in the node e2e tests.

I0713 01:15:32.908763    2975 plugins.go:56] Registering credential provider: google-dockercfg-url
I0713 01:15:32.910406    2975 config.go:156] body of failing http response: &{0xc8207d6ac0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.911834    2975 config.go:156] body of failing http response: &{0xc8207d6cc0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.913305    2975 config.go:156] body of failing http response: &{0xc8207d6e80 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.914465    2975 config.go:156] body of failing http response: &{0xc8207d7040 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.915502    2975 config.go:156] body of failing http response: &{0xc8207d7200 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.916609    2975 config.go:156] body of failing http response: &{0xc8207d73c0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.917703    2975 config.go:156] body of failing http response: &{0xc8207d7580 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.918938    2975 config.go:156] body of failing http response: &{0xc8207d7740 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.919959    2975 config.go:156] body of failing http response: &{0xc8207d7900 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.920924    2975 config.go:156] body of failing http response: &{0xc8207d7b40 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.921986    2975 config.go:156] body of failing http response: &{0xc8207d7d00 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.923190    2975 config.go:156] body of failing http response: &{0xc8207d7ec0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.924502    2975 config.go:156] body of failing http response: &{0xc820790080 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.926943    2975 config.go:156] body of failing http response: &{0xc820790240 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.927894    2975 config.go:156] body of failing http response: &{0xc820790400 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.928942    2975 config.go:156] body of failing http response: &{0xc8207905c0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.929990    2975 config.go:156] body of failing http response: &{0xc820790780 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.931136    2975 config.go:156] body of failing http response: &{0xc820790940 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.932207    2975 config.go:156] body of failing http response: &{0xc820790b00 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.933227    2975 config.go:156] body of failing http response: &{0xc820790cc0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.934656    2975 config.go:156] body of failing http response: &{0xc820790e80 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.935695    2975 config.go:156] body of failing http response: &{0xc820791040 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.936742    2975 config.go:156] body of failing http response: &{0xc820791200 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.937971    2975 config.go:156] body of failing http response: &{0xc820791400 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.939060    2975 config.go:156] body of failing http response: &{0xc8207915c0 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.940642    2975 config.go:156] body of failing http response: &{0xc820791780 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.941890    2975 config.go:156] body of failing http response: &{0xc820791940 {0 0} false <nil> 0x67da70 0x67da10}
I0713 01:15:32.943356    2975 config.go:156] body of failing http response: &{0xc820504600 {0 0} false <nil> 0x67da70 0x67da10}

I didn't dig into failure, but FWIW, I checked and container registry registration always fails even in a green node e2e run.

@roberthbailey
Copy link
Contributor

can you squash commits?

@zmerlynn
Copy link
Member

This looks fine, but I'm heading out soon. Squash and I'll defer to @roberthbailey for final.

@Random-Liu
Copy link
Member

E0713 17:57:34.501324    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.503743    2935 config.go:156] body of failing http response: &{0xc8207a4500 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.503769    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.507202    2935 config.go:156] body of failing http response: &{0xc8207a4700 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.507238    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.512487    2935 config.go:156] body of failing http response: &{0xc8207a4900 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.512522    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.522237    2935 config.go:156] body of failing http response: &{0xc8207a4b00 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.522292    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.540172    2935 config.go:156] body of failing http response: &{0xc8207a4d40 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.540241    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.574401    2935 config.go:156] body of failing http response: &{0xc8207a4f40 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.574456    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.640219    2935 config.go:156] body of failing http response: &{0xc8207a5140 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.640286    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:34.770057    2935 config.go:156] body of failing http response: &{0xc8207a59c0 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:34.770135    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:35.028187    2935 config.go:156] body of failing http response: &{0xc8207a5d80 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:35.028258    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:35.542159    2935 config.go:156] body of failing http response: &{0xc8207a5f80 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:35.542240    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:36.568324    2935 config.go:156] body of failing http response: &{0xc8204fc2c0 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:36.568400    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:38.618818    2935 config.go:156] body of failing http response: &{0xc8204fca80 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:38.618909    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:42.717029    2935 config.go:156] body of failing http response: &{0xc8204fce80 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:42.717093    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:44.420624    2935 config.go:280] Setting pods for source file
I0713 17:57:50.912456    2935 config.go:156] body of failing http response: &{0xc8204fd280 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:57:50.912533    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json
I0713 17:57:54.420888    2935 config.go:280] Setting pods for source file
I0713 17:58:04.421878    2935 config.go:280] Setting pods for source file
I0713 17:58:07.300525    2935 config.go:156] body of failing http response: &{0xc8204fd880 {0 0} false <nil> 0x67db80 0x67db20}
E0713 17:58:07.300594    2935 metadata.go:183] failed to Get "http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json": http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/default/scopes?alt=json

@yujuhong
Copy link
Contributor

yujuhong commented Jul 13, 2016

I guess we don't have service account on the node e2e vm instances.

How about checking http://metadata.google.internal./computeMetadata/v1/instance/ first to see if service-account is there?

@Random-Liu
Copy link
Member

Random-Liu commented Jul 13, 2016

I guess It looks like the gcr credential provider is never enabled in e2e test before, because the metadata server is never reachable. Here is the log from the node e2e kubelet log:

I0713 21:39:43.320881    2963 plugins.go:56] Registering credential provider: google-dockercfg
I0713 21:39:43.321450    2963 plugins.go:56] Registering credential provider: google-dockercfg-url

However, now we change to use file based check, and it always passes, then kubelet keeps failing to access metadata server after that.

@roberthbailey
Copy link
Contributor

@yujuhong - if this looks good to you can you apply the label?

@yujuhong yujuhong added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 15, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@vishh vishh added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 15, 2016
@vishh
Copy link
Contributor Author

vishh commented Jul 15, 2016

Bumping up the priority since this is required for v1.3

@k8s-bot
Copy link

k8s-bot commented Jul 15, 2016

GCE e2e build/test passed for commit ea1a459.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@vishh
Copy link
Contributor Author

vishh commented Jul 15, 2016

@k8s-bot test this github issue #IGNORE

@k8s-bot
Copy link

k8s-bot commented Jul 15, 2016

GCE e2e build/test failed for commit ea1a459.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@vishh
Copy link
Contributor Author

vishh commented Jul 15, 2016

@k8s-bot test this github issue #IGNORE

@k8s-bot
Copy link

k8s-bot commented Jul 15, 2016

GCE e2e build/test failed for commit ea1a459.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@k8s-bot
Copy link

k8s-bot commented Jul 15, 2016

GCE e2e build/test passed for commit ea1a459.

@vishh
Copy link
Contributor Author

vishh commented Jul 18, 2016

@k8s-bot test this github issue #IGNORE

@k8s-bot
Copy link

k8s-bot commented Jul 18, 2016

GCE e2e build/test passed for commit ea1a459.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jul 18, 2016

GCE e2e build/test passed for commit ea1a459.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 8eb0cf5 into kubernetes:master Jul 18, 2016
@zmerlynn zmerlynn added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 18, 2016
@vishh vishh added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 18, 2016
k8s-github-robot pushed a commit that referenced this pull request Jul 19, 2016
…upstream-release-1.3

Automatic merge from submit-queue

Automated cherry pick of #28871

Cherry pick of #28871 on release-1.3.
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…of-#28871-upstream-release-1.3

Automatic merge from submit-queue

Automated cherry pick of kubernetes#28871

Cherry pick of kubernetes#28871 on release-1.3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants