[BUG] datadog-cluster-agent auto-detect failure when using EKS pod identity #32493
Description
Agent Environment
datadog:
agent: Cluster Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7
Describe what happened:
Unable to leverage EKS Pod Identity
Resulting in the following error in the datadog-agent-cluster
pod logs for auto-detect
on pod startup:
...
cluster-agent 2024-12-18 23:36:06 UTC | CLUSTER | WARN | (subcommands/start/command.go:335 in start) | Failed to auto-detect a Kubernetes cluster name. We recommend you set it manually via the cluster_name config option
...
However using the same namespace and an adhoc container I can successfully run: aws ec2 describe-instances
from the aws cli using the same EKS pod identity:
$ cat << EOF > pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: aws-debug
namespace: datadog
spec:
containers:
- name: amazoncli
image: amazon/aws-cli:2.22.2
# Just spin & wait forever
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
dnsPolicy: Default
serviceAccount: datadog-cluster-agent
EOF
Apply the manifest to test from the cluster's namespace:
$ kubectl apply -f ./pod.yaml
Confirm that pod associate is working for the namespace and serviceaccount:
$ kubectl exec -n datadog --stdin --tty aws-debug -- /bin/sh
sh-4.2# aws sts get-caller-identity
...# Confirm that identity is actually the pod
sh-4.2# aws ec2 describe-instances
...# See the json returning all ec2 instances in region
Describe what you expected:
In order for auto-detect
for EKS cluster name to work it requires the ec2:DescribeInstances
: https://docs.datadoghq.com/containers/guide/kubernetes-cluster-name-detection/
EKS Pod Identity provides an alternative way of authenticating with IAM at the pod boundary. Its an alternative to the IRSA approach for IAM role usage inside pods. I would expect the datadog-cluster-agent to pickup IAM credentials from the environment token set by EKS Pod Identity.
Steps to reproduce the issue:
- Create EKS cluster with EKS pod identity enabled: https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html
- Set an IAM role with the following pod identity association:
namespace: datadog
serviceAccount: datadog-cluster-agent
IAM trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": [
"sts:TagSession",
"sts:AssumeRole"
]
}
]
}
IAM permission policies:
{
"Statement": [
{
"Action": [
"ec2:DescribeSecurityGroups",
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus"
],
"Effect": "Allow",
"Resource": "*"
}
],
"Version": "2012-10-17"
}
- Deploy datadog helmchart: 3.83.0 with the following values.yml:
datadog:
apiKeyExistingSecret: datadog
logs:
enabled: true
containerCollectAll: true
apm:
portEnabled: true
instrumentation:
skipKPITelemetry: true
clusterAgent:
replicas: 2
createPodDisruptionBudget: true
processAgent:
enabled: true
orchestratorExplorer:
enabled: true
confd:
disk.yaml: |-
init_config:
instances:
- use_mount: false
file_system_exclude:
- autofs$
mount_point_exclude:
- /proc/sys/fs/binfmt_misc
- /host/proc/sys/fs/binfmt_misc
Additional environment details (Operating System, Cloud provider, etc):
Datadog:
helm chart: 3.83.0
agent: Cluster Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7
EKS:
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.2-eks-7f9249a
This could be a similar issue to the ignoring of local environment variables during initialization of the ec2 client required for IRSA: #29916
Activity