-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement StatsProvider interface using CRI stats #51557
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
fsInfo, err := p.cadvisor.GetFsInfoByFsUUID(storageID.Uuid) | ||
if err != nil { | ||
msg := fmt.Sprintf("Failed to get the info of the filesystem with id %q: %v", storageID.Uuid, err) | ||
if cadvisorfs.ErrNoSuchDevice == err { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: err == cadvisorfs.ErrNoSuchDevice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
glog.V(2).Infof("Failed to get filesystem info: storageID is nil") | ||
return nil | ||
} | ||
fsInfo, err := p.cadvisor.GetFsInfoByFsUUID(storageID.Uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is omitting the filesystem info on cadvisor error the current behavior or a new one introduced in this PR (due to the storage id mapping)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only here in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will GetFsInfoByFsUUID
return an error (assuming the uuid exists)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetFsInfoByFsUUID
and GetDirFsInfo
(used for retrieving the rootFs information from cAdvisor) convert the UUID and the directory name to the device name and call getFsInfoByDeviceName
to get the filesystem stats. So GetFsInfoByFsUUID
can fail with the same reasons as GetDirFsInfo
could fail.
Do you suggest we return the error instead of partial summary when the UUID does exist but something wrong when fetching the stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to explain we return partial results. Also check (or file an issue against) heapster to make sure 1) it is capable of processing partial results, and 2) the monitoring pipeline prefers partial results.
return nil, err | ||
} | ||
|
||
for _, fs := range resp.ImageFilesystems { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention this in the PR description. CRI may return the stats of multiple image filesystems but here we only return the first one. Is this an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
3e3d627
to
ecec2de
Compare
PR looks good to me except for the one nit: #51557 (comment) |
@yguo0905 could you add a comment that we return partial results and link to the heapster issue? |
Done. I'm waiting for @dashpole to update the cadvisor dependency, and then I will remove the changes in |
lgtm pending @dashpole updating the cadvisor deps |
@@ -258,3 +258,7 @@ func (cc *cadvisorClient) HasDedicatedImageFs() (bool, error) { | |||
} | |||
return imageFsInfo.Device != rootFsInfo.Device, nil | |||
} | |||
|
|||
func (cc *cadvisorClient) GetFsInfoByFsUUID(uuid string) (cadvisorapiv2.FsInfo, error) { | |||
return cc.GetFsInfoByFsUUID(uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the function is calling itself? Did I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this function to expose the embedded one in manager.Manager
.
|
||
cadvisorapiv2 "github.com/google/cadvisor/info/v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be in the group above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
"github.com/stretchr/testify/assert" | ||
|
||
cadvisorfs "github.com/google/cadvisor/fs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please follow convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
"github.com/golang/glog" | ||
|
||
cadvisorfs "github.com/google/cadvisor/fs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// CRI may return the stats of multiple image filesystems but we only | ||
// return the first one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add TODO to support multiple ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// getFsInfo returns the information of the filesystem with the specified | ||
// storageID. If any error occurs, this function logs the error and returns | ||
// nil. | ||
func (p *criStatsProvider) getFsInfo(storageID *runtimeapi.StorageIdentifier) *cadvisorapiv2.FsInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we return error, and let the caller to decide whether to log error or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way, we want to have the logging logic in a shared function.
}, | ||
// UserDefinedMetrics is not supported by CRI. | ||
} | ||
if stats.Cpu != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use stas.GetCpu().GetUsageCoreNanoSeconds().GetValue() to get rid of all the nil checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid setting default values in the results, I still need to check whether stas.GetCpu().GetUsageCoreNanoSeconds().GetValue()
is 0 or not.
Removed the changed in |
/lgtm |
/retest Review the full test history for this PR. |
/lgtm cancel Does not build |
This PR depends on the cadvisor update: #51751 The PR itself is good, but I added the do-not-merge label just so that the bot wouldn't kick off the tests repeatedly. |
/retest |
@kubernetes/kubernetes-release-managers #51751 has merged. Looks like we need to remove the do-not-merge |
@yguo0905 : verify is failing, can you PTAL? |
Thanks. Looks like an error caught by govet. I will fix and rebase. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yguo0905, yujuhong Associated issue: 46984 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 49133, 51557, 51749, 50842, 52018) |
Automatic merge from submit-queue (batch tested with PRs 49133, 51557, 51749, 50842, 52018) Implement StatsProvider interface using CRI stats Ref: #46984 This is the follow up of #50932 - I include the cadvisor dependency changes in this PR for now to make it build. @dashpole will update the cadvisor dependency very soon, and I will remove the change once it's updated. - Please take a closer look at the implementation in `cri_stats_provider.go` since we currently don't have any runtime implementing the CRI stats interface and the changes here cannot be enabled in e2e tests. - Pod level network stats and container level logs stats are not provided. - In `cadvisor_stats_provider.go`, we are able to remove the call to `getCgroupStats` in `ImageFsStats` for getting the timestamp of the stats, given that we've changed cadvisor to include the timestamp in `FsInfo`. - Fixed the usage of `assert.Equal` in unit tests. **Release note**: ``` Support getting container stats from CRI. ``` /assign @yujuhong /assign @Random-Liu
|
||
// GetFsInfoByFsUUID returns the stats of the filesystem with the specified | ||
// uuid. | ||
GetFsInfoByFsUUID(uuid string) (cadvisorapiv2.FsInfo, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding this without implementing in _unsupported and _windows broke cross-build. fix in #52073
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Forgot about the cross-build. It would be nice if this can be caught in presubmit test.
Ref: #46984
This is the follow up of #50932
cri_stats_provider.go
since we currently don't have any runtime implementing the CRI stats interface and the changes here cannot be enabled in e2e tests.cadvisor_stats_provider.go
, we are able to remove the call togetCgroupStats
inImageFsStats
for getting the timestamp of the stats, given that we've changed cadvisor to include the timestamp inFsInfo
.assert.Equal
in unit tests.Release note:
/assign @yujuhong
/assign @Random-Liu