-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix kubelet PVC stale metrics #59170
Fix kubelet PVC stale metrics #59170
Conversation
/ok-to-test |
@zhangxiaoyu-zidif Thanks! |
05aab55
to
ff8f139
Compare
|
||
return fmt.Errorf(`metric output does not match expectation; want: | ||
%s | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run 'gofmt' =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangxiaoyu-zidif fixed
@jingxu97 Just checked other metrics, they don't have the same problem. |
Volumes on each node changes, we should not only add PVC metrics into gauge vector. It's better use a collector to collector metrics from stats. Sync from kubernetes#59170 with some modifications; - Use SummaryProvider instead of StatsProvider - Remove test files. - Generate build files.
@cofyc thank you for checking. What metrics did you check and why pvc metric has the problem and others don't? Thanks! |
hi, @jingxu97 I've checked all other metrics:
PVC metrics have problem, because they use Theoretically there are two ways to fix:
The second is good and easier way to go for PVC metrics. It also decouples statistics and metrics generating. |
} | ||
addGauge := func(desc *prometheus.Desc, pvcRef *stats.PVCReference, v float64, lv ...string) { | ||
lv = append([]string{pvcRef.Namespace, pvcRef.Name}, lv...) | ||
ch <- prometheus.MustNewConstMetric(desc, prometheus.GaugeValue, v, lv...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we did not use version that causes panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Log a warning message if NewConstMetric
emits an error.
podStats = []statsapi.PodStats{ | ||
{ | ||
PodRef: statsapi.PodReference{Name: "test-pod", Namespace: "test-namespace", UID: "UID_test-pod"}, | ||
StartTime: metav1.NewTime(time.Now()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a metav1.Now()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
That's weird.
Could you check this output on each node instead of prometheus? If all PVC volumes on node were correctly unmounted and detached, kubelet |
@cofyc, weird, I can't reproduce the issue today. I recompiled and reinstalled everything and it's gone. Sorry for the noise. Tested and LGTM from my side :-) |
/lgtm |
/assign @dchen1107 @derekwaynecarr for approval |
continue | ||
} | ||
pvcUniqStr := pvcRef.Namespace + pvcRef.Name | ||
if allPVCs.Has(pvcUniqStr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as though pvcRef
can be used as a map key.
I would prefer to gather everything into a map[stats.PVCReference]stats.VolumeStats
then iterate over that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbandy sets.String
is implemented by a map. It has no performance issue here and we can generate metrics in one iteration.
// ignore if no PVC reference | ||
continue | ||
} | ||
pvcUniqStr := pvcRef.Namespace + pvcRef.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really unique... a/bc
and ab/c
would collide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbandy Thanks! I should use /
to separate namespace and name. It's the way to generate unique key for objects in client-go cache store. Fixed now: fecff55#diff-ae1cd608f0ca9bcb432eeeca0cf3cde0R106
@gnufied Sorry, please take a look at again.
Volumes on each node changes, we should not only add PVC metrics into gauge vector. It's better use a collector to collector metrics from stats.
9502c28
to
fecff55
Compare
/lgtm |
@kubernetes/sig-node-pr-reviews, please approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Kubelet changes look good
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cofyc, derekwaynecarr, gnufied, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from internal
stats.
Currently, if a PV (bound to a PVC
testpv
) is attached and used by node A, then migrated to node B or just deleted from node A later.testpvc
metrics will not disappear from kubelet on node A. After a long running time,kubelet
process will keep a lot of stale volume metrics in memory.For these dynamic metrics, it's better to use a collector to collect metrics from a data source (
StatsProvider
here), like kube-state-metrics scraping metrics from kube-apiserver.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #57686
Special notes for your reviewer:
Release note: