Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix providerless kubelet startup #93607

Merged
merged 1 commit into from
Aug 2, 2020

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Jul 31, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
Follow up from #93579

That PR made the registration of the azure flags conditional (excluded in providerless builds), but the kubelet assumed the flag would be available when registering credential provider flags.

Does this PR introduce a user-facing change?:

NONE

/cc @BenTheElder @aojea

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 31, 2020
@liggitt liggitt added this to the v1.19 milestone Jul 31, 2020
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 31, 2020
@BenTheElder
Copy link
Member

/lgtm
Thanks Jordan.
We should get up a smoke test periodic or something

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2020
@liggitt
Copy link
Member Author

liggitt commented Jul 31, 2020

node e2e flake on cpu resources

Node didn't have enough resource

/retest

@liggitt liggitt added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jul 31, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 31, 2020
@@ -84,9 +84,7 @@ func addCredentialProviderFlags(fs *pflag.FlagSet) {
global := pflag.CommandLine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to get global here and pass to addLegacyCloudProviderCredentialProviderFlags . It can be taken when needed.

@liggitt
Copy link
Member Author

liggitt commented Jul 31, 2020

/retest

@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 31, 2020
@liggitt liggitt force-pushed the providerless-kubelet branch from c57adf5 to 93e2868 Compare July 31, 2020 19:50
@liggitt liggitt removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jul 31, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member Author

liggitt commented Jul 31, 2020

oops, pushed a commit meant for #93605 ... fixed

@BenTheElder
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2020
@aojea
Copy link
Member

aojea commented Jul 31, 2020

/retest

2 similar comments
@dims
Copy link
Member

dims commented Aug 1, 2020

/retest

@BenTheElder
Copy link
Member

/retest

@aojea
Copy link
Member

aojea commented Aug 1, 2020

hmm
/retest
this is the second time I see the same failure, publishing-bot`

@liggitt
Copy link
Member Author

liggitt commented Aug 1, 2020

verify is failing on #93623, wait until that is resolved to retest

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2020
@liggitt
Copy link
Member Author

liggitt commented Aug 1, 2020

/retest
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2020
@k8s-ci-robot k8s-ci-robot merged commit 8f835d5 into kubernetes:master Aug 2, 2020
@liggitt liggitt deleted the providerless-kubelet branch August 2, 2020 05:03
@andrewsykim
Copy link
Member

I'm not sure why the AKS e2es would start failing since they shouldn't use providerless builds but ran into this issue in another PR and this seems possibly related 🤔

 2020/08/02 14:22:39 main.go:312: Something went wrong: error creating deployer: failed to get azure credentials: error parsing credentials file /etc/azure-cred/credentials (1, 59): parsing error: no value can start with C
+ EXIT_VALUE=1
+ set +o xtrace
Cleaning up after docker in docker. 

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/93582/pull-kubernetes-e2e-aks-engine-azure/1289928857326981121#1:build-log.txt%3A15

@BenTheElder
Copy link
Member

that tag is not set by default so they'd have to be opting into it somehow

it also doesn't relate to how credentials are parsed, only how a flag is registered (and only if the tag is set).

@BenTheElder
Copy link
Member

@andrewsykim that log message is from kubetest, not kubelet:

+ kubetest --test --up --down --build=quick --dump=/logs/artifacts --provider=skeleton --deployment=aksengine --aksengine-agentpoolcount=2 --aksengine-admin-username=azureuser --aksengine-creds=/etc/azure-cred/credentials --aksengine-orchestratorRelease=1.18 --aksengine-mastervmsize=Standard_DS2_v2 --aksengine-agentvmsize=Standard_DS2_v2 --aksengine-deploy-custom-k8s --aksengine-location=westus2 --aksengine-public-key=/etc/azure-ssh/azure-ssh-pub --aksengine-template-url=https://raw.githubusercontent.com/Azure/aks-engine/master/examples/kubernetes.json --aksengine-download-url=https://github.com/Azure/aks-engine/releases/download/v0.53.0/aks-engine-v0.53.0-linux-amd64.tar.gz '--test_args=--ginkgo.focus=\[Conformance\]|\[NodeConformance\] --ginkgo.skip=\[Slow\]|\[Serial\]|\[Flaky\]' --ginkgo-parallel=30 --timeout=420m
2020/08/02 14:22:39 Warning: Couldn't find directory src/sigs.k8s.io/cloud-provider-azure under any of GOPATH /home/prow/go, defaulting to /home/prow/go/src/k8s.io/cloud-provider-azure
2020/08/02 14:22:39 main.go:325: Limiting testing to 7h0m0s
2020/08/02 14:22:39 aksengine_helpers.go:353: Reading credentials file /etc/azure-cred/credentials
2020/08/02 14:22:39 process.go:96: Saved XML output to /logs/artifacts/junit_runner.xml.
2020/08/02 14:22:39 process.go:153: Running: bash -c . hack/lib/version.sh && KUBE_ROOT=. kube::version::get_version_vars && echo "${KUBE_GIT_VERSION-}"
2020/08/02 14:22:39 process.go:155: Step 'bash -c . hack/lib/version.sh && KUBE_ROOT=. kube::version::get_version_vars && echo "${KUBE_GIT_VERSION-}"' finished in 414.971586ms
2020/08/02 14:22:39 main.go:312: Something went wrong: error creating deployer: failed to get azure credentials: error parsing credentials file /etc/azure-cred/credentials (1, 59): parsing error: no value can start with C

@BenTheElder
Copy link
Member

kubernetes/test-infra#17869 bumped a bunch of dependencies including some azure related stuff, and some later PR caused us to actually upgrade to a new image containing these in kubetest.

https://github.com/kubernetes/test-infra/commits/master/kubetest

@andrewsykim
Copy link
Member

Ah, thanks for clarifying @BenTheElder!

@andyzhangx @feiskyer can you look at this failure when you get a chance please?

@feiskyer
Copy link
Member

feiskyer commented Aug 3, 2020

Thanks @BenTheElder @andrewsykim. The error is error parsing credentials file /etc/azure-cred/credentials (1, 59): parsing error: no value can start with C, do we know anyone changed the credential file at that time?

@BenTheElder
Copy link
Member

BenTheElder commented Aug 3, 2020

Nobody would have changed the credential without being asked to.

@feiskyer
Copy link
Member

feiskyer commented Aug 4, 2020

@BenTheElder no worries, the credential issues have been fixed by test-infra-oncall.

@BenTheElder
Copy link
Member

Circling back, if anyone's curious this was the issue: kubernetes/test-infra#18618 (comment)

basically the format was invalid but the older go-toml version allowed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants