Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StatsProvider interface using CRI stats #51557

Merged
merged 1 commit into from
Sep 6, 2017

Conversation

yguo0905
Copy link
Contributor

Ref: #46984

This is the follow up of #50932

  • I include the cadvisor dependency changes in this PR for now to make it build. @dashpole will update the cadvisor dependency very soon, and I will remove the change once it's updated.
  • Please take a closer look at the implementation in cri_stats_provider.go since we currently don't have any runtime implementing the CRI stats interface and the changes here cannot be enabled in e2e tests.
  • Pod level network stats and container level logs stats are not provided.
  • In cadvisor_stats_provider.go, we are able to remove the call to getCgroupStats in ImageFsStats for getting the timestamp of the stats, given that we've changed cadvisor to include the timestamp in FsInfo.
  • Fixed the usage of assert.Equal in unit tests.

Release note:

Support getting container stats from CRI.

/assign @yujuhong
/assign @Random-Liu

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 29, 2017
@k8s-github-robot k8s-github-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Aug 29, 2017
Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

fsInfo, err := p.cadvisor.GetFsInfoByFsUUID(storageID.Uuid)
if err != nil {
msg := fmt.Sprintf("Failed to get the info of the filesystem with id %q: %v", storageID.Uuid, err)
if cadvisorfs.ErrNoSuchDevice == err {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: err == cadvisorfs.ErrNoSuchDevice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

glog.V(2).Infof("Failed to get filesystem info: storageID is nil")
return nil
}
fsInfo, err := p.cadvisor.GetFsInfoByFsUUID(storageID.Uuid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is omitting the filesystem info on cadvisor error the current behavior or a new one introduced in this PR (due to the storage id mapping)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only here in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will GetFsInfoByFsUUID return an error (assuming the uuid exists)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetFsInfoByFsUUID and GetDirFsInfo (used for retrieving the rootFs information from cAdvisor) convert the UUID and the directory name to the device name and call getFsInfoByDeviceName to get the filesystem stats. So GetFsInfoByFsUUID can fail with the same reasons as GetDirFsInfo could fail.

Do you suggest we return the error instead of partial summary when the UUID does exist but something wrong when fetching the stats?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to explain we return partial results. Also check (or file an issue against) heapster to make sure 1) it is capable of processing partial results, and 2) the monitoring pipeline prefers partial results.

return nil, err
}

for _, fs := range resp.ImageFilesystems {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention this in the PR description. CRI may return the stats of multiple image filesystems but here we only return the first one. Is this an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here to make it clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yguo0905 yguo0905 force-pushed the stats-cri branch 2 times, most recently from 3e3d627 to ecec2de Compare August 30, 2017 16:36
@yujuhong
Copy link
Contributor

PR looks good to me except for the one nit: #51557 (comment)

@yujuhong
Copy link
Contributor

@yguo0905 could you add a comment that we return partial results and link to the heapster issue?
Will lgtm once that's done.

@yguo0905
Copy link
Contributor Author

Done. I'm waiting for @dashpole to update the cadvisor dependency, and then I will remove the changes in vendor from this PR.

@yujuhong
Copy link
Contributor

lgtm pending @dashpole updating the cadvisor deps

@@ -258,3 +258,7 @@ func (cc *cadvisorClient) HasDedicatedImageFs() (bool, error) {
}
return imageFsInfo.Device != rootFsInfo.Device, nil
}

func (cc *cadvisorClient) GetFsInfoByFsUUID(uuid string) (cadvisorapiv2.FsInfo, error) {
return cc.GetFsInfoByFsUUID(uuid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the function is calling itself? Did I miss anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this function to expose the embedded one in manager.Manager.


cadvisorapiv2 "github.com/google/cadvisor/info/v2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be in the group above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


"github.com/stretchr/testify/assert"

cadvisorfs "github.com/google/cadvisor/fs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please follow convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


"github.com/golang/glog"

cadvisorfs "github.com/google/cadvisor/fs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// CRI may return the stats of multiple image filesystems but we only
// return the first one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add TODO to support multiple ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// getFsInfo returns the information of the filesystem with the specified
// storageID. If any error occurs, this function logs the error and returns
// nil.
func (p *criStatsProvider) getFsInfo(storageID *runtimeapi.StorageIdentifier) *cadvisorapiv2.FsInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we return error, and let the caller to decide whether to log error or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, we want to have the logging logic in a shared function.

},
// UserDefinedMetrics is not supported by CRI.
}
if stats.Cpu != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use stas.GetCpu().GetUsageCoreNanoSeconds().GetValue() to get rid of all the nil checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid setting default values in the results, I still need to check whether stas.GetCpu().GetUsageCoreNanoSeconds().GetValue() is 0 or not.

@yujuhong yujuhong added this to the v1.8 milestone Aug 31, 2017
@yguo0905
Copy link
Contributor Author

Removed the changed in vendor. PTAL.

@yujuhong
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 31, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta
Copy link
Contributor

fejta commented Sep 1, 2017

/lgtm cancel

Does not build

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2017
@yujuhong yujuhong added do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 1, 2017
@yujuhong
Copy link
Contributor

yujuhong commented Sep 1, 2017

This PR depends on the cadvisor update: #51751

The PR itself is good, but I added the do-not-merge label just so that the bot wouldn't kick off the tests repeatedly.

@feiskyer
Copy link
Member

feiskyer commented Sep 6, 2017

/retest

@dims
Copy link
Member

dims commented Sep 6, 2017

@kubernetes/kubernetes-release-managers #51751 has merged. Looks like we need to remove the do-not-merge

@dims
Copy link
Member

dims commented Sep 6, 2017

@yguo0905 : verify is failing, can you PTAL?

@yguo0905
Copy link
Contributor Author

yguo0905 commented Sep 6, 2017

@yguo0905 : verify is failing, can you PTAL?

Thanks. Looks like an error caught by govet. I will fix and rebase.

@yujuhong yujuhong removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 6, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2017
@yujuhong
Copy link
Contributor

yujuhong commented Sep 6, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yguo0905, yujuhong

Associated issue: 46984

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 49133, 51557, 51749, 50842, 52018)

k8s-github-robot pushed a commit that referenced this pull request Sep 6, 2017
Automatic merge from submit-queue (batch tested with PRs 49133, 51557, 51749, 50842, 52018)

Implement StatsProvider interface using CRI stats

Ref: #46984

This is the follow up of #50932

- I include the cadvisor dependency changes in this PR for now to make it build. @dashpole will update the cadvisor dependency very soon, and I will remove the change once it's updated.
- Please take a closer look at the implementation in `cri_stats_provider.go` since we currently don't have any runtime implementing the CRI stats interface and the changes here cannot be enabled in e2e tests.
- Pod level network stats and container level logs stats are not provided.
- In `cadvisor_stats_provider.go`, we are able to remove the call to `getCgroupStats` in `ImageFsStats` for getting the timestamp of the stats, given that we've changed cadvisor to include the timestamp in `FsInfo`.
- Fixed the usage of `assert.Equal` in unit tests.

**Release note**:
```
Support getting container stats from CRI.
```

/assign @yujuhong 
/assign @Random-Liu
@k8s-github-robot k8s-github-robot merged commit 3b2e32e into kubernetes:master Sep 6, 2017
@yguo0905 yguo0905 deleted the stats-cri branch September 6, 2017 22:35
This was referenced Sep 7, 2017

// GetFsInfoByFsUUID returns the stats of the filesystem with the specified
// uuid.
GetFsInfoByFsUUID(uuid string) (cadvisorapiv2.FsInfo, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this without implementing in _unsupported and _windows broke cross-build. fix in #52073

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Forgot about the cross-build. It would be nice if this can be caught in presubmit test.

k8s-github-robot pushed a commit that referenced this pull request Sep 7, 2017
Automatic merge from submit-queue

Fix cross-build

**What this PR does / why we need it**:
The cross-build was broken by the following PRs:

#51728
#51557

This PR fixes the cross-build rather than revert them.

Fixes #52074

**Release note**:
```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants