Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed service-mirror metrics warning #11246

Merged
merged 1 commit into from
Aug 16, 2023
Merged

Fixed service-mirror metrics warning #11246

merged 1 commit into from
Aug 16, 2023

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Aug 14, 2023

Whenever the service mirror's main loop was triggered again, the following warnings were generated:

time="2023-08-14T20:16:29Z" level=warning msg="failed to register Prometheus gauge Desc{fqName: \"service_cache_size\", help: \"Number of items in the client-go service cache\", constLabels: {cluster=\"remote\"}, variableLabels: []}: duplicate metrics collector registration attempted"
time="2023-08-14T20:16:29Z" level=warning msg="failed to register Prometheus gauge Desc{fqName: \"endpoints_cache_size\", help: \"Number of items in the client-go endpoints cache\", constLabels: {cluster=\"remote\"}, variableLabels: []}: duplicate metrics collector registration attempted"

To fix, this adds into the cluster watcher's Stop() method a directive to unregister the prometheus cache metrics associated to the cluster's client API.

Whenever the service mirror's main loop was triggered again, the following warnings were generated:

```
time="2023-08-14T20:16:29Z" level=warning msg="failed to register Prometheus gauge Desc{fqName: \"service_cache_size\", help: \"Number of items in the client-go service cache\", constLabels: {cluster=\"remote\"}, variableLabels: []}: duplicate metrics collector registration attempted"
time="2023-08-14T20:16:29Z" level=warning msg="failed to register Prometheus gauge Desc{fqName: \"endpoints_cache_size\", help: \"Number of items in the client-go endpoints cache\", constLabels: {cluster=\"remote\"}, variableLabels: []}: duplicate metrics collector registration attempted"
```

To fix, this adds into the cluster watcher's `Stop()` method a directive to unregister the prometheus cache metrics associated to the cluster's client API.
@alpeb alpeb requested a review from a team as a code owner August 14, 2023 20:47
alpeb added a commit that referenced this pull request Aug 15, 2023
Similar to #11246, we were getting warnings in the Destination
controller whenever a new remote cluster was linked, complaining about
the attempt to register dupe api cache metrics. So we lower here to
debug level such messages.
alpeb added a commit that referenced this pull request Aug 15, 2023
Similar to #11246, the destination controller was complaning above
trying to register dupe metrics for the api cache gauges, when a given
target cluster got re-linked. This change unregisters the gauges for the
target cluster when said cluster is removed.

Supersedes #11252
@alpeb alpeb merged commit d823ad7 into main Aug 16, 2023
@alpeb alpeb deleted the alpeb/cleanup-cache-gauges branch August 16, 2023 18:34
adleong pushed a commit that referenced this pull request Aug 16, 2023
Similar to #11246, the destination controller was complaning above
trying to register dupe metrics for the api cache gauges, when a given
target cluster got re-linked. This change unregisters the gauges for the
target cluster when said cluster is removed.

Supersedes #11252
alpeb added a commit that referenced this pull request Aug 16, 2023
This is a release candidate for stable-2.14.0; we encourage you to help trying
it out!

This edge release contains a number of improvements over the multi-cluster
features introduced in the last edge release supporting flat networks. It also
hardens the containers security stance by removing write access to the root
filesystem.

* Enhanced `linkerd multicluster link` to allow clusters to be linked without a
  gateway ([#11226])
* Added cluster store size gauge metric ([#11256])
* Disabled local traffic policy for remote discovery ([#11257])
* Fixed various innocuous multi-cluster warnings ([#11251], [#11246], [#11253])
* Set `readOnlyRootFilesystem: true` in all the containers, as they don't
  require write permissions ([#11221]; fixes [#11142]) (thanks @mikutas!)
@alpeb alpeb mentioned this pull request Aug 16, 2023
alpeb added a commit that referenced this pull request Aug 16, 2023
This is a release candidate for stable-2.14.0; we encourage you to help trying
it out!

This edge release contains a number of improvements over the multi-cluster
features introduced in the last edge release supporting flat networks. It also
hardens the containers security stance by removing write access to the root
filesystem.

* Enhanced `linkerd multicluster link` to allow clusters to be linked without a
  gateway ([#11226])
* Added cluster store size gauge metric ([#11256])
* Disabled local traffic policy for remote discovery ([#11257])
* Fixed various innocuous multi-cluster warnings ([#11251], [#11246], [#11253])
* Set `readOnlyRootFilesystem: true` in all the containers, as they don't
  require write permissions ([#11221]; fixes [#11142]) (thanks @mikutas!)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants