-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue that pull image failed from a cross-subscription Azure Container Registry by SP access #66429
Conversation
/kind bug |
@@ -177,6 +170,16 @@ func (a *acrProvider) Provide() credentialprovider.DockerConfig { | |||
defer cancel() | |||
|
|||
glog.V(4).Infof("listing registries") | |||
|
|||
var err error | |||
a.servicePrincipalToken, err = auth.GetServicePrincipalToken(a.config, a.environment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will call AAD API every time to get token every time 'Provide' is called.
Can it cache 'servicePrincipalToken', and 'registryClient', and only do a re-fetch if later 'registryClient.List' fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to do this only when not getting the expected result.
Since the list is actually successful but without the expected result, you can add a check of the list, e.g. len(result) == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karataliu @feiskyer The Provide
func call happens when there is pull image requirement, and if pull image fails, it will try to rerun Provide
by lookup
func every 5 min, so not every kubelet will call this func only when there is pull image failure, related invocation code is here:
creds, withCredentials := keyring.Lookup(repoToPull) |
Seems the config key can be a glob pattern, so we might just remove the list acr part. kubernetes/pkg/credentialprovider/keyring.go Line 184 in af60963
|
/hold |
/hold cancel |
@karataliu code changed, and test pass, PTAL |
for ix := range result { | ||
loginServer := getLoginServer(result[ix]) | ||
var cred *credentialprovider.DockerConfigEntry | ||
if a.config.UseManagedIdentityExtension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, we keep the original logic for UseManagedIdentityExtension, while for sp, we always return loginServer as *.azurecr.io
, *.azurecr.cn
and that would save ARM calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the new method
/test pull-kubernetes-integration |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, feiskyer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
I'm sorry but maybe I am missing something but I'm not sure how the code solves the cross-sub acr access issue for MSI?
|
@keyoke, this PR fix cross-sub acr access issue for sp, not MSI |
@andyzhangx - Ok thanks any plans for us to achieve the same with MSI? |
I filed another issue here: #67892, I will dig into the code and find a way. You could also comment if you have any better idea, thanks. |
What this PR does / why we need it:
after granting sp access to azure ACR , pull image from ACR would fail, and after wait about 15-30min(or restart kubelet directly), pull image would succeed. Root cause is that
servicePrincipalToken
needs to be refreshed when doingregistryClient.List
, otherwise it will always return empty registry list. Pull image error would be like following:Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #65225
Special notes for your reviewer:
After discuss with dong,
registryClient.List
won't be necessary, instead we return{"*.azurecr.io", "*.azurecr.cn", "*.azurecr.de", "*.azurecr.us"}
like aws, gce code logic, it will do the url matching.I will cherry pick this PR to all supported version, every version has this issue.
Neet to mention that this PR would also make image pull from a cross-subscription ACR work after we grant SP access to an ACR in a different subs. Original code won’t work because it’s using the SubscriptionID of current AKS cluster, since ACR is in a different subs, registryClient.List will never get cross subs ACR repos.
Release note:
/sig azure
/assign @feiskyer @khenidak @brendandburns @karataliu