-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce an EndpointsWatcher cache structure #11190
Conversation
* EndpointsWatcher cache will be populated with one EndpointsWatcher per cluster (including the local cluster) * Using a namespace scoped informer, it will add/remove watchers based on secrets read from the namespace. Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
direction looks good here. we'll want to be able to stop the informers when the secret is deleted, which I think we can do with the channel passed into Sync
. we may also want to add a Stop
function to EndpointsWatcher to unregister the event listeners to avoid memory leaks, but maybe stopping the informers and removing the EndpointsWatcher from the cache is sufficient to getting it all eligible for gc.
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
@@ -221,8 +221,10 @@ func (cs *ClusterStore) addCluster(clusterName string, secret *v1.Secret) error | |||
stopCh, | |||
} | |||
|
|||
remoteAPI.Sync(stopCh) | |||
metadataAPI.Sync(stopCh) | |||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could even go a step further and sync each of these in their own goroutine to sync them in parallel.
remoteAPI, err := k8s.NewFakeAPI([]string{}...) | ||
if err != nil { | ||
return nil, nil, err | ||
} | ||
|
||
metadataAPI, err := k8s.NewFakeMetadataAPI(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on our off-github convo, I think the race occurs between these two calls NewFakeAPI
and NewFakeMetadataAPI
, because they rely on a call to ToRuntimeObject()
which uses this global struct instance from scheme/register.go
, which holds the map that is contended over:
var Scheme = runtime.NewScheme()
Maybe a simple fix could be to have ToRuntimeObject()
receive as an argument the scheme instance, which could be instantiated separately for these two API clients.
} | ||
|
||
// Handle delete events | ||
if len(tt.deleteClusters) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is not needed.
decodeFn: decodeFn, | ||
} | ||
|
||
_, err := cs.k8sAPI.Secret().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that we're not interested in Update events? Can you add a comment about that?
@@ -57,6 +58,26 @@ func InitializeMetadataAPI(kubeConfig string, resources ...APIResource) (*Metada | |||
return api, nil | |||
} | |||
|
|||
func InitializeMetadataAPIForConfig(kubeConfig *rest.Config, resources ...APIResource) (*MetadataAPI, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can slim down InitializeMetadataAPI
body now:
func InitializeMetadataAPI(kubeConfig string, resources ...APIResource) (*MetadataAPI, error) {
config, err := k8s.GetConfig(kubeConfig, "")
if err != nil {
return nil, fmt.Errorf("error configuring Kubernetes API client: %w", err)
}
return InitializeMetadataAPIForConfig(config, resources...)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
Left some non-critical comments above. The one thing that might be annoying are the sporadic test failures though.
Also, fyi, EndpointSlices entered GA in k8s 1.21, which is our oldest supported version, at which point the EndpointSlices feature gate defaults to true. And in more recent versions the feature gate is gone altogether. So you could just assume that's enabled to simplify things, but leaving that at your discretion 😉
## edge-23.8.2 This edge release adds improvements to Linkerd's multi-cluster features as part of the [flat network support] planned for Linkerd stable-2.14.0. In addition, it fixes an issue ([#10764]) where warnings about an invalid metric were logged frequently by the Destination controller. * Added a new `remoteDiscoverySelector` field to the multicluster `Link` CRD, which enables a service mirroring mod where the control plane performs discovery for the mirrored service from the remote cluster, rather than creating Endpoints for the mirrored service in the source cluster ([#11190], [#11201], [#11220], and [#11224]) * Fixed missing "Services" menu item in the Spanish localization for the `linkerd-viz` web dashboard ([#11229]) (thanks @mclavel!) * Replaced `server_port_subscribers` Destination controller gauge metric with `server_port_subscribes` and `server_port_unsubscribes` counter metrics ([#11206]; fixes [#10764]) * Replaced deprecated `failure-domain.beta.kubernetes.io` labels in Helm charts with `topology.kubernetes.io` labels ([#11148]; fixes [#11114]) (thanks @piyushsingariya!) [#10764]: #10764 [#11114]: #11114 [#11148]: #11148 [#11190]: #11190 [#11201]: #11201 [#11206]: #11206 [#11220]: #11220 [#11224]: #11224 [#11229]: #11229 [flat network support]: https://linkerd.io/2023/07/20/enterprise-multi-cluster-at-scale-supporting-flat-networks-in-linkerd/
## edge-23.8.2 This edge release adds improvements to Linkerd's multi-cluster features as part of the [flat network support] planned for Linkerd stable-2.14.0. In addition, it fixes an issue ([#10764]) where warnings about an invalid metric were logged frequently by the Destination controller. * Added a new `remoteDiscoverySelector` field to the multicluster `Link` CRD, which enables a service mirroring mode where the control plane performs discovery for the mirrored service from the remote cluster, rather than creating Endpoints for the mirrored service in the source cluster ([#11190], [#11201], [#11220], and [#11224]) * Fixed missing "Services" menu item in the Spanish localization for the `linkerd-viz` web dashboard ([#11229]) (thanks @mclavel!) * Replaced `server_port_subscribers` Destination controller gauge metric with `server_port_subscribes` and `server_port_unsubscribes` counter metrics ([#11206]; fixes [#10764]) * Replaced deprecated `failure-domain.beta.kubernetes.io` labels in Helm charts with `topology.kubernetes.io` labels ([#11148]; fixes [#11114]) (thanks @piyushsingariya!) [#10764]: #10764 [#11114]: #11114 [#11148]: #11148 [#11190]: #11190 [#11201]: #11201 [#11206]: #11206 [#11220]: #11220 [#11224]: #11224 [#11229]: #11229 [flat network support]: https://linkerd.io/2023/07/20/enterprise-multi-cluster-at-scale-supporting-flat-networks-in-linkerd/
Currently, the control plane does not support indexing and discovering resources across cluster boundaries. In a multicluster set-up, it is desirable to have access to endpoint data (and by extension, any configuration associated with that endpoint) to support pod-to-pod communication. Linkerd's destination service should be extended to support reading discovery information from multiple sources (i.e. API Servers).
This change introduces an
EndpointsWatcher
cache. On creation, the cache registers a pair of event handlers:multicluster
secrets that embed kubeconfig files. For each such secret, the cache creates a correspondingEndpointsWatcher
to read remote discovery information.multicluster
secret is deleted.Additionally, a set of tests have been added that assert the behaviour of the cache when secrets are created and/or deleted. The cache will be used by the destination service to do look-ups for services that belong to another cluster, and that are running in a "remote discovery" mode.
Signed-off-by: Matei David matei@buoyant.io