The endpoint is lost when the APIServer is restored. #124547
Description
What happened?
It happens when the apiserver goes down and after a few minutes when the apiserver comes back up, some endpoints have notReadyAddresses and do not recover.It's an accidental phenomenon.
The cause is that the endpoint obtained from the Informer is not the latest. In the syncService method of endpoint_controller.go,
currentEndpoints, err := e.endpointsLister.Endpoints(service.Namespace).Get(service.Name)
Because the obtained endpoint is not the latest, the system determines that the endpoints are the same. As a result, the endpoint is not updated.
I have added the log to print the endpoint and confirmed this section.
This is what the log shows.
I0425 08:25:57.715142 11 endpoints_controller.go:423] "About to update endpoints for service" service="manager/service-mchiroer"
I0425 08:25:57.715216 11 endpoints_controller.go:516] "endpoints are equal, skipping update" service="manager/service-mchiroer"
I0425 08:25:57.715225 11 endpoints_controller.go:389] "Finished syncing service endpoints" service="manager/service-mchiroer" startTime="83.332µs"
So I think the cache in informer is not caching the latest data, which is a bug.
What did you expect to happen?
The notReadyAddresses of the endpoint should be changed to addresses when the pod status is updated.
How can we reproduce it (as minimally and precisely as possible)?
1、Stop the apiserver service of the cluster.
2、Recover the apiserver service after a few minutes.
Repeat the preceding operations. The problem will recur.
Anything else we need to know?
No response
Kubernetes version
1.28
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here