Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda 2.9.1 on AKS with Pod-Identity Looks for AzureCLICredential- #4026

Closed
tshaiman opened this issue Dec 18, 2022 · 20 comments · Fixed by #4030
Closed

Keda 2.9.1 on AKS with Pod-Identity Looks for AzureCLICredential- #4026

tshaiman opened this issue Dec 18, 2022 · 20 comments · Fixed by #4030
Labels
bug Something isn't working

Comments

@tshaiman
Copy link

Report

when running Keda 2.9.1 with Pod Identity ( No Workload Identity) , the DefaultAzureCredentials() chain look for AzureCLICredentials but fails on "/bin/sh azurecli file not found"

The AzureCLICredential should be remvoed from the Chain list

Expected Behavior

Default Azure Credentials has options to opt-Out several Chain providers such as VisualStudioCredentials /AzureCLI Credentials etc.
so since this is Pod-Identity with distroless image the Azure CLI should not be part of this chain

Actual Behavior

many "AzureCLICredential: fork/exec /bin/sh: no such file or directory\n\terror reading service account token"

Steps to Reproduce the Problem

  1. Use AKS 1.24 with Mariner node Pools
  2. Use Keda 2.91 with WorkloadIdentity=False
  3. Deploy keda with AAD-Pod Identity and add Scaled Object

Logs from KEDA operator


keda-operator-d5464cdd6-zvdw2 keda-operator 2022-12-18T15:03:48Z        ERROR   azure_servicebus_scaler error getting service bus entity length {"type": "ScaledObject", "namespace": "vi-be-map-dev11", "name": "rc-visolo", "error": "ChainedTokenCredential: failed to acquire a token.\nAttempted credentials:\n\tAzureCLICredential: fork/exec /bin/sh: no such file or directory\n\terror reading service account token - open : no such file or directory"}
keda-operator-d5464cdd6-zvdw2 keda-operator github.com/kedacore/keda/v2/pkg/scalers.(*azureServiceBusScaler).GetMetricsAndActivity
keda-operator-d5464cdd6-zvdw2 keda-operator     /workspace/pkg/scalers/azure_servicebus_scaler.go:266
keda-operator-d5464cdd6-zvdw2 keda-operator github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
keda-operator-d5464cdd6-zvdw2 keda-operator     /workspace/pkg/scaling/cache/scalers_cache.go:136
keda-operator-d5464cdd6-zvdw2 keda-operator github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
^Ckeda-operator-d5464cdd6-zvdw2 keda-operator   /workspace/pkg/scaling/scale_handler.go:360
keda-operator-d5464cdd6-zvdw2 keda-operator github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
keda-operator-d5464cdd6-zvdw2 keda-operator     /workspace/pkg/scaling/scale_handler.go:162
keda-operator-d5464cdd6-zvdw2 keda-operator 2022-12-18T15:03:48Z        ERROR   azure_servicebus_scaler error getting service bus entity length {"type": "ScaledObject", "namespace": "vi-be-map-dev11", "name": "celebs", "error": "ChainedTokenCredential: failed to acquire a token.\nAttempted credentials:\n\tAzureCLICredential: fork/exec /bin/sh: no such file or directory\n\terror reading service account token - open : no such file or directory"}
keda-operator-d5464cdd6-zvdw2 keda-operator github.com/kedacore/keda/v2/pkg/scalers.(*azureServiceBusScaler).GetMetricsAndActivity
keda-operator-d5464cdd6-zvdw2 keda-operator     /workspace/pkg/scalers/azure_servicebus_scaler.go:266
keda-operator-d5464cdd6-zvdw2 keda-operator github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
keda-operator-d5464cdd6-zvdw2 keda-operator     /workspace/pkg/scaling/cache/scalers_cache.go:136

KEDA Version

2.9.1

Kubernetes Version

1.24

Platform

Microsoft Azure

Scaler Details

Azure Service Bus

Anything else?

No response

@tshaiman tshaiman added the bug Something isn't working label Dec 18, 2022
@tomkerkhove tomkerkhove moved this to Proposed in Roadmap - KEDA Core Dec 18, 2022
@JorTurFer
Copy link
Member

Hi,
AzureCLICredentials is useful for local troubleshooting, but it's weird because it shouldn't fail as ChainedTokenCredential doesn't a fallback for every provider in the chain (it's the default behaviour, indeed).
If this is a problem, we have to remove it from the chain, but it's weird because e2e tests cover the scenario where AzureCLICredentials is tried as part of the chain and there isn't any error, and it works

@JorTurFer
Copy link
Member

JorTurFer commented Dec 18, 2022

Could the problem be that no provider in the chain can get the token, and you only see the error from the first one?, I have to try it

@tshaiman
Copy link
Author

no there are MI pod identity that works fine
when i take the same code and revert it to keda 2.8.1 it works

@barclayadam
Copy link

I've experienced the same issue upgrading from (Helm Chart) 2.8.2. Using AAD pod identity works perfectly with that version. The only change I make is to upgrade to 2.9.2

We have a single TriggerAuthentication that does not specify identityId as part of podIdentity spec, which should mean it falls back to aadpodidentity label binding. Adding explicit identityId makes no difference.

Azure SDK is supposed to return the last error. Is it possible that the credentials are not being added to the chain? Is there an error at https://github.com/kedacore/keda/blob/main/pkg/scalers/azure_servicebus_scaler.go#L330 that means it's not added?

@JorTurFer
Copy link
Member

Interesting point... Maybe it's added IDK why 🤔
Let me try your scenario in my local cluster to be sure about that

@JorTurFer
Copy link
Member

JorTurFer commented Dec 20, 2022

I have added a logger in case of errors that prevent the addition of it. The tags are jorturfer/keda:msi-logger and jorturfer/keda-metrics-apiserver:msi-logger (both are generated from main, so they are 2.9.1 + the logger change). Could you try to deploy that version and see if there is any error? In parallel, I'm adding aad-pod-identity to my local cluster to check it from my side

@JorTurFer
Copy link
Member

I have reproduced the issue, but it doesn't seem related with the AzureCLICredential because removing it fails too. I feel that it's related with any wrong configuration after sdk migration (we upgraded the service bus SDK) with the bad luck that AAD-Pod-Identity doesn't have e2e test :(

@JorTurFer
Copy link
Member

JorTurFer commented Dec 20, 2022

I have found the issue and sadly it's really complex to solve properly because the SDK it's totally closed for extension... For the moment, I'm going to undo the change that unified both identity providers (which I hope is the future) meanwhile we try to include our requirements.

@tshaiman
Copy link
Author

thanks for the update @JorTurFer

@jmos5156
Copy link

So what options do we have for those of us experiencing this particular issue, are on :
AKS - version 1.23.5
aad-pod - chart version 4.1.15
keda - chart version: 2.9.1

TIA

@JorTurFer
Copy link
Member

The only option atm is to downgrade KEDA to 2.8.1 or switching from aad-pod-identity to azure workload identity. This PR solves the issue but we don't plan (not yet at least) to do a hotfix release. You could use main tag instead of a pined version once this is merged, but main tag also has integrated the admission webhooks, so atm it's a bit more complicated because you need to deploy all needed resources for webhooks manually due to helm chart isn't ready yet (I plan to have it ready this week).

My suggestion is to switch to aad workload identity because aad-pod-identity is deprecated and will be unmaintained at the end of this year (I don't remember the exact date), but I understand that this could require more effort than other options.

@jmos5156
Copy link

I could move to workload identity, however, that too exhibits the same sort of issue (documented here #3977). So it feels like whichever way I go I'm going to hit an issue.

@JorTurFer
Copy link
Member

I could move to workload identity, however, that too exhibits the same sort of issue (documented here #3977). So it feels like whichever way I go I'm going to hit an issue.

Are you using Mariner distro too? That issue is located with Mariner distro

@tshaiman
Copy link
Author

@jmos5156 i would say the most stable
path is :
keda 2.8.1
+pod identity

we see no errors on this configuration
and we also run on mariner with AKS 1.24
but i don’t that if you stay in 1.23 it’s an issue

@jmos5156
Copy link

We using Ubuntu 18.04.6 LTS and see the issue.

2023-01-10T14:41:33Z	ERROR	azure_servicebus_scaler	error getting service bus entity length	{"type": "ScaledObject", "namespace": "zd", "name": "azure-servicebus-queue-scaled-superslurper", "error": "ChainedTokenCredential: failed to acquire a token.\nAttempted credentials:\n\tAzureCLICredential: fork/exec /bin/sh: no such file or directory\n\terror reading service account token - open : no such file or directory"}
github.com/kedacore/keda/v2/pkg/scalers.(*azureServiceBusScaler).GetMetricsAndActivity
	/workspace/pkg/scalers/azure_servicebus_scaler.go:266
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsForScaler
	/workspace/pkg/scaling/cache/scalers_cache.go:77
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
	/workspace/pkg/scaling/scale_handler.go:439
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
	/workspace/pkg/metricsservice/server.go:45
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
	/workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79
google.golang.org/grpc.(*Server).processUnaryRPC
	/workspace/vendor/google.golang.org/grpc/server.go:1340
google.golang.org/grpc.(*Server).handleStream
	/workspace/vendor/google.golang.org/grpc/server.go:1713
google.golang.org/grpc.(*Server).serveStreams.func1.2
	/workspace/vendor/google.golang.org/grpc/server.go:965

@JorTurFer
Copy link
Member

That distro isn't affected by the error with workload identity, as I said, the issue is with the distro and workload identity, not with KEDA itself, I have in production workloads using workload identity indeed and I haven't seen that problem.
The error you are seeing is related with a bug introduced in that version when we migrated from the old identity sdk to the new one. That change is reverted here #4030

@juanpgarces
Copy link

I'm not sure if anyone is experiencing this same issue after upgrading to the version 2.9.2 or above, instead of working as expected now it just displays a different error. Thought it would be worth asking if anyone experienced the same. @JorTurFer

2023-09-12T18:11:14Z ERROR azure_servicebus_scaler error getting service bus entity length {"type": "ScaledObject", "namespace": "staging", "name": "functionx", "error": "ChainedTokenCredential: failed to acquire a token.\nAttempted credentials:\n\tmanaged identity timed out"}
github.com/kedacore/keda/v2/pkg/scalers.(*azureServiceBusScaler).GetMetricsAndActivity
/workspace/pkg/scalers/azure_servicebus_scaler.go:262
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsForScaler
/workspace/pkg/scaling/cache/scalers_cache.go:78
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
/workspace/pkg/scaling/scale_handler.go:443
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
/workspace/pkg/metricsservice/server.go:45
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
/workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79
google.golang.org/grpc.(*Server).processUnaryRPC
/workspace/vendor/google.golang.org/grpc/server.go:1340
google.golang.org/grpc.(*Server).handleStream
/workspace/vendor/google.golang.org/grpc/server.go:1713
google.golang.org/grpc.(*Server).serveStreams.func1.2
/workspace/vendor/google.golang.org/grpc/server.go:965

@JorTurFer
Copy link
Member

I'm not sure if anyone is experiencing this same issue after upgrading to the version 2.9.2 or above, instead of working as expected now it just displays a different error. Thought it would be worth asking if anyone experienced the same. @JorTurFer

2023-09-12T18:11:14Z ERROR azure_servicebus_scaler error getting service bus entity length {"type": "ScaledObject", "namespace": "staging", "name": "functionx", "error": "ChainedTokenCredential: failed to acquire a token.\nAttempted credentials:\n\tmanaged identity timed out"} github.com/kedacore/keda/v2/pkg/scalers.(*azureServiceBusScaler).GetMetricsAndActivity /workspace/pkg/scalers/azure_servicebus_scaler.go:262 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsForScaler /workspace/pkg/scaling/cache/scalers_cache.go:78 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics /workspace/pkg/scaling/scale_handler.go:443 github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics /workspace/pkg/metricsservice/server.go:45 github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler /workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79 google.golang.org/grpc.(*Server).processUnaryRPC /workspace/vendor/google.golang.org/grpc/server.go:1340 google.golang.org/grpc.(*Server).handleStream /workspace/vendor/google.golang.org/grpc/server.go:1713 google.golang.org/grpc.(*Server).serveStreams.func1.2 /workspace/vendor/google.golang.org/grpc/server.go:965

The bug was solved by this commit: #4030
It was released as part of v2.10, so you have to downgrade to v2.8 or upgrade to at least v2.10

@juanpgarces
Copy link

I'm afraid I tried upgrading to v2.10.1 and v2.11.2 and I keep getting the same error I described above. I wished it had more information on what it could be but I am left in the dark with a 'managed identity timed out' message, without changing anything from v2.8 to the versions above. @JorTurFer Thanks for answering so quickly, was hoping someone with the same issue could shed some light.

2023-09-12T18:42:37Z    ERROR   azure_servicebus_scaler error getting service bus entity length {"type": "ScaledObject", "namespace": "staging", "name": "nidcustomerbackgroundservices", "error": "ChainedTokenCredential: failed to acquire a token.\nAttempted credentials:\n\tmanaged identity timed out"}
github.com/kedacore/keda/v2/pkg/scalers.(*azureServiceBusScaler).GetMetricsAndActivity
        /workspace/pkg/scalers/azure_servicebus_scaler.go:262
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
        /workspace/pkg/scaling/cache/scalers_cache.go:130
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
        /workspace/pkg/scaling/scale_handler.go:471
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
        /workspace/pkg/metricsservice/server.go:47
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
        /workspace/pkg/metricsservice/api/metrics_grpc.pb.go:99
google.golang.org/grpc.(*Server).processUnaryRPC
        /workspace/vendor/google.golang.org/grpc/server.go:1337
google.golang.org/grpc.(*Server).handleStream
        /workspace/vendor/google.golang.org/grpc/server.go:1714
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /workspace/vendor/google.golang.org/grpc/server.go:959

@JorTurFer
Copy link
Member

JorTurFer commented Sep 12, 2023

If you have tried v2.10.1 and v2.11.2 and if fails, I'd suggest creating an issue for it because this issue was for a specific problem and it was already fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants