Skip to content

The endpoint is lost when the APIServer is restored. #124547

Open
@Black-max12138

Description

What happened?

It happens when the apiserver goes down and after a few minutes when the apiserver comes back up, some endpoints have notReadyAddresses and do not recover.It's an accidental phenomenon.
The cause is that the endpoint obtained from the Informer is not the latest. In the syncService method of endpoint_controller.go,
currentEndpoints, err := e.endpointsLister.Endpoints(service.Namespace).Get(service.Name)
Because the obtained endpoint is not the latest, the system determines that the endpoints are the same. As a result, the endpoint is not updated.
I have added the log to print the endpoint and confirmed this section.
This is what the log shows.
I0425 08:25:57.715142 11 endpoints_controller.go:423] "About to update endpoints for service" service="manager/service-mchiroer"
I0425 08:25:57.715216 11 endpoints_controller.go:516] "endpoints are equal, skipping update" service="manager/service-mchiroer"
I0425 08:25:57.715225 11 endpoints_controller.go:389] "Finished syncing service endpoints" service="manager/service-mchiroer" startTime="83.332µs"
So I think the cache in informer is not caching the latest data, which is a bug.

What did you expect to happen?

The notReadyAddresses of the endpoint should be changed to addresses when the pod status is updated.

How can we reproduce it (as minimally and precisely as possible)?

1、Stop the apiserver service of the cluster.
2、Recover the apiserver service after a few minutes.
Repeat the preceding operations. The problem will recur.

Anything else we need to know?

No response

Kubernetes version

1.28

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

No one assigned

    Labels

    area/controller-managerkind/bugCategorizes issue or PR as related to a bug.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions