kubelet could reject pods with NodeAffinity error due to incomplete informer cache on the node object

### What happened?

In our end (AKS), we observed that there is a case that kubelet could reject pods with NodeAffinty status if the node watch call got closed due to unknown reason during kubelet start up.

An illustrated procedure is (using `Tx` to indicate time):

- T1: kubelet starts, spawns the processes for registering node (create node) and node informer for syncing the node object in parallel
- T2: the node informer successfully listed empty nodes list from kube-apiserver , proceeding to watch call
https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/staging/src/k8s.io/client-go/tools/cache/reflector.go#L401-L409
- T3: the watch call stopped due to unknown issue, for instance network closed. In this case, the watch call would exit with error like: 
  > very short watch: k8s.io/client-go/informers/factory.go:160: Unexpected watch close - watch lasted less than a second and no items received

  In this case, the node informer will back-off and retry with new list-watch steps:
  https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/staging/src/k8s.io/client-go/tools/cache/reflector.go#L336-L340
- T4: kubelet successfully created the node object in kube-apiserver side. Kube-scheduler kicks in and assigns pods to the node
- T5: because in T3 the list call was succeeded with empty results, the node informer had been marked as synced:
  
https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/pkg/kubelet/kubelet.go#L442-L444
https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/staging/src/k8s.io/client-go/tools/cache/delta_fifo.go#L308-L316
  In this case, kubelet starts syncing pods and attempts to run. If these pods are set with affinity settings, the kubelet node admission check will fail the pods with "NodeAffinity" status because the node informer cache is empty. Kubelet uses a minimal node object without expected node labels for performing admission check:
https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/pkg/kubelet/kubelet_getters.go#L291-L298
- T6: kubelet node informer retry kicks in, this time it successfully synced the node object from remote. Now retried pods scheduling on this node would succeed as the node labels are expected

In our reproduction, we captured the following logs:

```
T1: kubelet[3538]: I1225 01:02:59.216231    3538 kubelet.go:2346] "Starting kubelet main sync loop"
T2: kubelet[3538]: I1225 01:04:04.030571    3538 reflector.go:359] Caches populated for *v1.Node from k8s.io/client-go/informers/factory.go:160
T3: kubelet[3538]: W1225 01:04:04.170608    3538 reflector.go:470] k8s.io/client-go/informers/factory.go:160: watch of *v1.Node ended with: very short watch: k8s.io/client-go/informers/factory.go:160: Unexpected watch close - watch lasted less than a second and no items received
T4: kubelet[3538]: I1225 01:04:08.309280    3538 kubelet_node_status.go:76] "Successfully registered node" node="<node-name>"
T5 (1): kubelet[3538]: I1225 01:04:29.348027    3538 predicate.go:164] "Predicate failed on Pod" pod="<namespace>/<pod>" err="Predicate NodeAffinity failed"
T5 (2): kubelet[3538]: E1225 01:04:29.351153    3538 kubelet_node_status.go:462] "Error getting the current node from lister" err="node \"<node-name>\" not found"
T6: kubelet[3538]: I1225 01:04:56.591752    3538 reflector.go:359] Caches populated for *v1.Node from k8s.io/client-go/informers/factory.go:160
```

In the normal case, the retry back off (between T6 and T5) should be short and kubelet should not leave many pods failed. However, in rare scenarios that the node lister might be failing for some other reasons before the first successful call (waiting for TLS bootstrapping for example), and increased the back-off (max out to 30s), then in theory, the gap of `T6-T5` could be as long as 30s + jitter, which aligns with the logs timestamps we observed in above.

### What did you expect to happen?

Kubelet should back off the node list-watch calls until the node object has been populated in the node informer cache. Or, we should invalidate the node informer cache after successfully registering the node from kubelet to maintain the correct version of node object in memory.

### How can we reproduce it (as minimally and precisely as possible)?

Simulate the watch closed call like mentioned above.

### Anything else we need to know?

Related issues/PRs:

- https://github.com/kubernetes/kubernetes/pull/99336
- https://github.com/kubernetes/kubernetes/pull/94087

### Kubernetes version

<details>

1.30 - but we believe this behavior is reproducible in all verisons.

</details>


### Cloud provider

<details>
AKS
</details>


### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
```

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


	if fallbackToList {
	err = r.list(ctx)
	if err != nil {
	return err
	}
	}

	logger.V(2).Info("Caches populated", "type", r.typeDescription, "reflector", r.name)
	return r.watchWithResync(ctx, w)

	wait.BackoffUntil(func() {
	if err := r.ListAndWatchWithContext(ctx); err != nil {
	r.watchErrorHandler(ctx, r, err)
	}
	}, r.backoffManager, true, ctx.Done())

	func (f *DeltaFIFO) HasSynced() bool {
	f.lock.Lock()
	defer f.lock.Unlock()
	return f.hasSynced_locked()
	}

	func (f *DeltaFIFO) hasSynced_locked() bool {
	return f.populated && f.initialPopulationCount == 0
	}

	func (kl Kubelet) getNodeAnyWay() (v1.Node, error) {
	if kl.kubeClient != nil {
	if n, err := kl.nodeLister.Get(string(kl.nodeName)); err == nil {
	return n, nil
	}
	}
	return kl.initialNode(context.TODO())
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet could reject pods with NodeAffinity error due to incomplete informer cache on the node object #129463

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	nodeHasSynced = func() bool {
	return kubeInformers.Core().V1().Nodes().Informer().HasSynced()
	}