Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/cvo/metrics: Add a cluster_version_capability metric #755

Merged
merged 1 commit into from
Mar 23, 2022

Conversation

wking
Copy link
Member

@wking wking commented Mar 23, 2022

Fleet capability choices should make their way up to Red Hat via Insights uploads. But some clusters have Telemetry enabled but Insights disabled. For these clusters, we'll want this data in Telemetry, and getting it into metrics (this commit) is the first step. After this lands, we'll need to update the monitoring operator like this to get these shipped up off of cluster.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2022
wking added a commit to wking/cluster-monitoring-operator that referenced this pull request Mar 23, 2022
The cluster-version operator is growing this new metric [1], and Ben
Parees wants it included in Telemetry so we collect it for clusters
that have Telemetry enabled, even if Insights is disabled [2].
Cardinality is expected to be very small, with a single label ('name'
[1]) and a restricted value set (only three registered capabilities
expected for 4.11).

[1]: openshift/cluster-version-operator#755
[2]: openshift/enhancements#922 (comment)
@wking wking force-pushed the capabilities-metrics branch from 7bb68e3 to 60ee931 Compare March 23, 2022 06:14
Fleet capability choices should make their way up to Red Hat via
Insights uploads.  But some clusters have Telemetry enabled but
Insights disabled [1].  For these clusters, we'll want this data in
Telemetry, and getting it into metrics (this commit) is the first
step.  After this lands, we'll need to update the monitoring operator
like [2] to get these shipped up off of cluster.

[1]: openshift/enhancements#922 (comment)
[2]: openshift/cluster-monitoring-operator#1477
@wking wking force-pushed the capabilities-metrics branch from 60ee931 to dabc1a6 Compare March 23, 2022 06:28
wking added a commit to wking/cluster-monitoring-operator that referenced this pull request Mar 23, 2022
The cluster-version operator is growing this new metric [1], and Ben
Parees wants it included in Telemetry so we collect it for clusters
that have Telemetry enabled, even if Insights is disabled [2].
Cardinality is expected to be very small, with a single label ('name'
[1]) and a restricted value set (only three registered capabilities
expected for 4.11).

[1]: openshift/cluster-version-operator#755
[2]: openshift/enhancements#922 (comment)
@wking
Copy link
Member Author

wking commented Mar 23, 2022

PromeCIeus for the e2e-operator run has:

cluster_version_capability{endpoint="metrics", instance="10.0.0.7:9099", job="cluster-version-operator", name="baremetal", namespace="openshift-cluster-version", pod="cluster-version-operator-86f88db7f6-xbw5n", service="cluster-version-operator"} 1
cluster_version_capability{endpoint="metrics", instance="10.0.0.7:9099", job="cluster-version-operator", name="openshift-samples", namespace="openshift-cluster-version", pod="cluster-version-operator-86f88db7f6-xbw5n", service="cluster-version-operator"}  1

So hurray :)

@jottofar
Copy link
Contributor

/lgtm

@jottofar
Copy link
Contributor

/test e2e-agnostic-upgrade

@wking
Copy link
Member Author

wking commented Mar 23, 2022

Azure install failure is already tracked in an installer bug, and I'm confident that there's no update-specific impact here.

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

In response to this:

Azure install failure is already tracked in an installer bug, and I'm confident that there's no update-specific impact here.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2022

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking wking added the lgtm Indicates that a PR is ready to be merged. label Mar 23, 2022
@wking
Copy link
Member Author

wking commented Mar 23, 2022

Manually labeling per Jack's earlier comment, because Prow and GitHub are fighting.

@openshift-merge-robot openshift-merge-robot merged commit aeb4ba0 into openshift:master Mar 23, 2022
@wking wking deleted the capabilities-metrics branch March 23, 2022 20:02
ch <- g
enabledCapabilities[capability] = struct{}{}
}
for _, capability := range cv.Status.Capabilities.KnownCapabilities {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear on why we need to send known capabilities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two use cases:

  • conditional updates using local PromQL can distinguish between "yeah, that cap is not enabled here" (value 0) and "err, something's broken with scraping this metrics" (no match)
  • Telemetry can gauge the install-fraction for a capability (countOfClustersWithCapEnabled / countOfClustersWithCapKnown). with count by (name) (cluster_version_capability == 1) / count by (name) (cluster_version_capability). If cluster_version_capability did not include known-but-not-enabled caps, you'd need some added filtering to get at countOfClustersWithCapKnown.

wking added a commit to wking/cluster-monitoring-operator that referenced this pull request Mar 30, 2022
The cluster-version operator is growing this new metric [1], and Ben
Parees wants it included in Telemetry so we collect it for clusters
that have Telemetry enabled, even if Insights is disabled [2].
Cardinality is expected to be very small, with a single label ('name'
[1]) and a restricted value set (only three registered capabilities
expected for 4.11).

[1]: openshift/cluster-version-operator#755
[2]: openshift/enhancements#922 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants