Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge pages volume plugin. #47658

Closed

Conversation

PiotrProkop
Copy link
Contributor

@PiotrProkop PiotrProkop commented Jun 16, 2017

What this PR does / why we need it:

This PR adds a new volume plugin for huge pages volumes. Hugepages is a feature of the Linux Kernel that allows a subset of system memory to be managed in larger increments.
More documentation can be found here.

Which issue this PR fixes:

Support for huge pages was discussed on resource management work-group meetings. The doc can be found here.

Release note:

HugePages Volume Driver
[Huge pages](https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt) can now be used as a storage driver in Kubernetes.

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: PiotrProkop
We suggest the following additional approver: thockin

Assign the PR to them by writing /assign @thockin in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-label-needed labels Jun 16, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @PiotrProkop. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 16, 2017
@rootfs
Copy link
Contributor

rootfs commented Jun 16, 2017

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2017
if minSize.ScaledValue(resource.Mega) < 2 {
allErrs = append(allErrs, field.Invalid(fldPath.Child("minSize"), hugePages.MinSize, "MaxSize lower than pageSize"))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min <= max

@@ -2335,6 +2335,35 @@ func TestValidateVolumes(t *testing.T) {
errtype: field.ErrorTypeRequired,
errfield: "scaleIO.system",
},
// HugePages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a negative test case where min > max

func printHugePagesVolumeSource(hp *api.HugePagesVolumeSource, w PrefixWriter) {
w.Write(LEVEL_2, "Type:\tHugePages (pseudo filesystem of type hugetlbfs representing HugePages)\n"+
" PageSize:\t%v\n"+
" Size:\t%v\n"+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename maxSize to Size so it will reflect mount options.

@@ -0,0 +1,19 @@
/*
Copyright 2015 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017 (and for all new files)


var readFile = ioutil.ReadFile

func detectHugepages() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return the amount of huge pages too and validate max is not more than huagepages_free or hugepages_total.

pkg/api/types.go Outdated
// +optional
PageSize string
// Defaults to PageSize
// The size option sets the maximum value of memory (huge pages).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/size/MaxSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename maxSize to Size so it will reflect mount options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't rename it because there is a generated method Size(), so i will leave MaxSize as struct field but change json, and protobuf field name to size.

pkg/api/types.go Outdated
// The size option is specified as resource.Quantity
MaxSize string
// Defaults to PageSize
// The min_size option sets the minimum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/min_size/minSize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/api/v1/types.go also

pkg/api/types.go Outdated
// Defaults to 2M
// +optional
PageSize string
// Defaults to PageSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user specifies minSize, then the default maxSize should be minSize, IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to stick as close to hugetlbfs mount options as possible so think it should default to nothing. Defaulting to PageSize is not a good idea 😄

if obj.PageSize == "" {
obj.PageSize = "2M"
}
if obj.MaxSize == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should guarantee the default values are valid.
e.g.suppose user specifies as below

    hugePages:
       pageSize: "2M"
       maxSize: ""
       minSize: "100M"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid not to specify size than there is no limit being set. But I do not handle this case in volume plugin itself I have to add it in prepareMountOptions().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was saying wasn't if not to specify size is valid. I was saying is that if users specify as below:

    hugePages:
       pageSize: "2M"
       maxSize: ""
       minSize: "100M"

This will default to:

    hugePages:
       pageSize: "2M"
       maxSize: "2M"
       minSize: "100M"

Then the maxSize < minSize, which is invalid.


// TODO(pprokop): add support for other PageSizes
if hugePages.PageSize != "2M" {
allErrs = append(allErrs, field.Invalid(fldPath.Child("pageSize"), hugePages.PageSize, "Not supported page size"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give the supported size in the error info

allErrs = append(allErrs, field.Invalid(fldPath.Child("minSize"), hugePages.MinSize, "Not a valid quantity"))
} else {
if minSize.ScaledValue(resource.Mega) < 2 {
allErrs = append(allErrs, field.Invalid(fldPath.Child("minSize"), hugePages.MinSize, "MaxSize lower than pageSize"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/MaxSize/MinSize

if hugePages.PageSize != "2M" {
allErrs = append(allErrs, field.Invalid(fldPath.Child("pageSize"), hugePages.PageSize, "Not supported page size"))
}
maxSize, err := resource.ParseQuantity(hugePages.MaxSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 12.3 a valid value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an interesting case because doing mount with option size=5.3M will not fail but it will be transformed into 5 bytes because kernel will discard everything after dot. The code is here. So either K8s should not allowed using fractional sizes or convert every size option to bytes before mounting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though, some integer value might be invalid for hugepages. For example, suppose the pagesize is 2M, how many pages should that be for minSize or maxSize being 5M? 2 pages or 3 pages or other number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel itself rounds down size to HPAGE_SIZE boundary, so if you specify 5M it will be 2 pages.

@@ -744,6 +744,8 @@ func describeVolumes(volumes []api.Volume, w PrefixWriter, space string) {
printCephFSVolumeSource(volume.VolumeSource.CephFS, w)
case volume.VolumeSource.StorageOS != nil:
printStorageOSVolumeSource(volume.VolumeSource.StorageOS, w)
case volume.VolumeSource.HugePages != nil:
printHugePagesVolumeSource(volume.VolumeSource.HugePages, w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also another func where should place this in. (I'm reviewing on my phone, it's not easy to find that func, sorry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to add a case in func describePersistentVolume

}

// HugePages are https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
// This struct implements "k8s.io/pkg/volume.Mounter" and "k8s.io/pkg/volme.UnMounter" interface.Mounter" and "k8s.io/pkg/volme.UnMounter" interfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this comment

}

func getVolumeSource(spec *volume.Spec) (*v1.HugePagesVolumeSource, bool) {
var readOnly bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? Seems like it's not used.

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 19, 2017
@rootfs
Copy link
Contributor

rootfs commented Jun 19, 2017

@PiotrProkop run hack/verify-bazel.sh to fix the verify test error

@PiotrProkop
Copy link
Contributor Author

@rootfs should i push generated code also ?

@rootfs
Copy link
Contributor

rootfs commented Jun 20, 2017

@PiotrProkop yes, please make two commits: with for generated code and one for the rest

@krousey krousey assigned vishh and unassigned krousey Jun 20, 2017
@vishh
Copy link
Contributor

vishh commented Jun 20, 2017

Is there an overall proposal for adding huge pages support for kubernetes?

@jeremyeder
Copy link

@vishh yes, the proposal is in the community repo: kubernetes/community#181
And here is the doc prepared by @derekwaynecarr for our f2f: kubernetes/enhancements#275

@vishh
Copy link
Contributor

vishh commented Jun 21, 2017 via email

@rootfs
Copy link
Contributor

rootfs commented Jun 21, 2017

@PiotrProkop fix the test errors, most can be fixed by running make update

@jeremyeder
Copy link

@PiotrProkop was there perhaps some offline conversation with @derekwaynecarr or @sjenning about this PR versus what we have in kubernetes/community#181 ?

@PiotrProkop
Copy link
Contributor Author

I only showed the volume plugin demo on resource management workgroup and now probably I should sync with @derekwaynecarr and @sjenning to add this to the proposal.

@k8s-github-robot
Copy link

@PiotrProkop PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2017
@PiotrProkop
Copy link
Contributor Author

Hi @rootfs and @xiangpengzhao, I wanted to rebase a PR but this PR removed pkg/api/v1/types.go. Now I am a bit confused about how to make api changes should i create PR against this repo ? Are there any updated docs how to make api changes ?

@xiangpengzhao
Copy link
Contributor

I think @caesarxuchao can help to explain.

@jeremyeder
Copy link

Was there some discussion/reasoning as to why this is a volume plugin and not a device plugin #695? I thought we wanted hugepages to be abstracted via resourceclasses which currently are a layer on top of device plugins. @sjenning @derekwaynecarr

@PiotrProkop
Copy link
Contributor Author

@jeremyeder there are two ways of consuming huge pages. First is transparent usage like JVM based application. Second is using a hugetlbfs mount point. Volume plugin was chosen because some application like DPDK based application use hugetlbfs and requires having e.g. /mnt/huge hugetlbfs mount. I think it is more natural for an administrator to use volume plugin interface when we require a special mount inside container instead of doing it implicitly.

@jeremyeder
Copy link

That's what I thought but the end state is that we have potentially two ways of users consuming hugepages. We should aim for that not to be the case, agree?

@PiotrProkop
Copy link
Contributor Author

This proposal states that setting cgroup limits for transparent huge pages will be handled via CRI and not device plugins. In the future they will become a first class citizens ( capacityV2 objects) like CPU and memory and we could allocate them using Resource classes.
This means having two ways of consuming huge pages is actual design and we can discuss if consuming hugetlbfs via device plugin is better than using volume plugins. In my opinions it is more natural to consume it as volume plugins hence mounting filesystems in k8s is handled that way.

@caesarxuchao
Copy link
Member

pkg/api/v1/types.go is moved to staging/src/k8s.io/api/core/v1/types.go.

I'll add a doc explaining the situation. Sorry for the troubles.

@rootfs
Copy link
Contributor

rootfs commented Jul 18, 2017

@PiotrProkop late to response but @caesarxuchao has replied. Update staging/src/k8s.io/api/core/v1/types.go and run hack/update-all.sh.

@PiotrProkop
Copy link
Contributor Author

@caesarxuchao and @rootfs thanks 😄

@PiotrProkop PiotrProkop force-pushed the hugepages-volume-plugin branch from e42765f to 1e21eb9 Compare July 19, 2017 11:53
@k8s-ci-robot
Copy link
Contributor

@PiotrProkop: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-bazel 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-bazel
pull-kubernetes-e2e-kops-aws 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-e2e-kops-aws
pull-kubernetes-kubemark-e2e-gce 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-kubemark-e2e-gce
pull-kubernetes-e2e-gce-etcd3 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-e2e-gce-etcd3
pull-kubernetes-federation-e2e-gce 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-federation-e2e-gce
pull-kubernetes-node-e2e 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-node-e2e
pull-kubernetes-verify 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-verify
pull-kubernetes-unit 1e21eb958da679cda23af5a5bedb7b17698ecce7 link /test pull-kubernetes-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@squall0gd squall0gd force-pushed the hugepages-volume-plugin branch from 1e21eb9 to 74514c9 Compare August 2, 2017 09:52
@PiotrProkop
Copy link
Contributor Author

I am closing this PR in favor of this #50072 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants