Skip to content

[BUG] datadog-cluster-agent auto-detect failure when using EKS pod identity #32493

Open
@sarcasticadmin

Description

Agent Environment

datadog:

agent: Cluster Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7

Describe what happened:

Unable to leverage EKS Pod Identity

Resulting in the following error in the datadog-agent-cluster pod logs for auto-detect on pod startup:

...
cluster-agent 2024-12-18 23:36:06 UTC | CLUSTER | WARN | (subcommands/start/command.go:335 in start) | Failed to auto-detect a Kubernetes cluster name. We recommend you set it manually via the cluster_name config option
...

However using the same namespace and an adhoc container I can successfully run: aws ec2 describe-instances from the aws cli using the same EKS pod identity:

$ cat << EOF > pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: aws-debug
  namespace: datadog
spec:
  containers:
  - name: amazoncli
    image: amazon/aws-cli:2.22.2
    # Just spin & wait forever
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ] 
  dnsPolicy: Default
  serviceAccount: datadog-cluster-agent
EOF

Apply the manifest to test from the cluster's namespace:

$ kubectl apply -f ./pod.yaml

Confirm that pod associate is working for the namespace and serviceaccount:

$ kubectl exec -n datadog --stdin --tty aws-debug -- /bin/sh
sh-4.2#  aws sts get-caller-identity
...# Confirm that identity is actually the pod
sh-4.2# aws ec2 describe-instances
...# See the json returning all ec2 instances in region

Describe what you expected:

In order for auto-detect for EKS cluster name to work it requires the ec2:DescribeInstances : https://docs.datadoghq.com/containers/guide/kubernetes-cluster-name-detection/

EKS Pod Identity provides an alternative way of authenticating with IAM at the pod boundary. Its an alternative to the IRSA approach for IAM role usage inside pods. I would expect the datadog-cluster-agent to pickup IAM credentials from the environment token set by EKS Pod Identity.

Steps to reproduce the issue:

  1. Create EKS cluster with EKS pod identity enabled: https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html
  2. Set an IAM role with the following pod identity association:

namespace: datadog
serviceAccount: datadog-cluster-agent
IAM trust relationship:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:TagSession",
                "sts:AssumeRole"
            ]
        }
    ]
}

IAM permission policies:

{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceStatus"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}
  1. Deploy datadog helmchart: 3.83.0 with the following values.yml:
datadog:
  apiKeyExistingSecret: datadog
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
    instrumentation:
      skipKPITelemetry: true
  clusterAgent:
    replicas: 2
    createPodDisruptionBudget: true
  processAgent:
    enabled: true
  orchestratorExplorer:
    enabled: true
  confd:
    disk.yaml: |-
      init_config:
      instances:
        - use_mount: false
          file_system_exclude:
            - autofs$
          mount_point_exclude:
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Additional environment details (Operating System, Cloud provider, etc):

Datadog:

helm chart: 3.83.0
agent: Cluster Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7

EKS:

Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.2-eks-7f9249a

This could be a similar issue to the ignoring of local environment variables during initialization of the ec2 client required for IRSA: #29916

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions