hugepage proposal #181

sjenning · 2016-12-15T16:54:58Z

Proposal for supporting applications that desire pre-allocated huge pages in Kubernetes

@derekwaynecarr @kubernetes/rh-cluster-infra @dchen1107 @vishh @jeremyeder @kubernetes/sig-node

xref old main repo PR kubernetes/kubernetes#33601

philips · 2016-12-20T19:48:56Z

contributors/design-proposals/hugepages.md

+architecture and making a new resource field for each size doesn't scale.  Pods
+can do a nodeSelector on this label to land on a system with a particular huge
+page size.  This is similiar to how the `beta.kubernetes.io/arch` label
+operates.


It seems like you need the request to also specify the expected node huge page size, right? Otherwise it could request 10 pages and get 10Gb on a machine that has a non-default configuration.

Is there anyway to design this so the request is in bytes instead of pages?

I guess my thought was that a node would only be configured/labeled with one hugepage size. We would need to quantize a value in bytes to a multiple of the hugepage size. However, from a UX perspective I can see where specifying the hugepage quantity as a resource.Quantity would be nice. Thanks!

Does the memory covered by hugepages resource come out of the total memory request, or is the final memory footprint the sum of the two?

i think the request should include the huge page size.

when prototyping this in kubernetes/kubernetes#44817, i used a request syntax that included the size similar to how it appears in syfs.

$ ls /sys/kernel/mm/hugepages hugepages-1048576kB hugepages-2048kB

so the pod spec has a request for the following:

alpha.kubernetes.io/hugepages-2048kB: 512

jonboulle · 2016-12-23T10:04:27Z

contributors/design-proposals/hugepages.md

+
+On x86_64, there are two huge page sizes: 2MB and 1GB.  1GB huge pages are also
+called gigantic pages.  1GB must be enabled on kernel boot line with
+`hugepagesz=1g`. Huge pages, especially 1GB ones, should to be allocated


should to be -> should be

jonboulle · 2016-12-23T10:05:43Z

contributors/design-proposals/hugepages.md

+
+While a system may support multiple huge pages sizes, it is assumed that nodes
+configured with huge pages will only use one huge page size, namely the default
+page size in `cat /proc/meminfo | grep Hugepagesize`.  In Linux, this is 2MB


grep Hugepagesize /proc/meminfo :-)

jonboulle · 2016-12-23T10:06:12Z

contributors/design-proposals/hugepages.md

+because there are a variety of huge page sizes across different hardware
+architecture and making a new resource field for each size doesn't scale.  Pods
+can do a nodeSelector on this label to land on a system with a particular huge
+page size.  This is similiar to how the `beta.kubernetes.io/arch` label


jonboulle · 2016-12-23T10:08:38Z

contributors/design-proposals/hugepages.md

+cAdvisor will need to be modified to return the number of available huge pages.
+This is already supported in [runc/libcontainer](../../vendor/github.com/opencontainers/runc/libcontainer/cgroups/utils.go)
+
+### Phase 2: Expose huge pages in CRI


Can you explain why this is desirable rather than just sticking with the pod-level implementation as above? In the abstract you talked about it as a pod feature and this jump is unclear.

jonboulle · 2016-12-23T10:13:58Z

contributors/design-proposals/hugepages.md

+supported: 2MB and 1GB.  The design, however, should accommodate additional huge
+page sizes available on other architectures.
+
+**NOTE: This design, as currently proposed, requires the use of pod-level


cross-reference would be good

jonboulle · 2016-12-23T10:16:06Z

contributors/design-proposals/hugepages.md

+- A sensitivity to memory access latency
+
+Example applications include:
+- Java applications can back the heap with huge pages using the `-XX:+UseLargePages` option.


s/can/which/

jonboulle · 2016-12-23T10:40:29Z

contributors/design-proposals/hugepages.md

+      limits:
+	    hugepages: "10"
+  nodeSelector:
+    kubernetes.io/huge-page-size: "2MB"


alpha.kubernetes.io ?

jonboulle · 2016-12-23T10:50:46Z

contributors/design-proposals/hugepages.md

+For the Java use case the JVM maps the huge pages as a shared memory segment and
+memlocks them to prevent the system from moving or swapping them out.
+
+There are several issues here:


How about adding something about what Kubernetes users need to do to mitigate these issues? (e.g. special node configuration?). I almost wonder if we'd want to distinguish more clearly in the API between the availability of anonymous vs shared memory, given these additional requirements for the latter case.

derekwaynecarr · 2017-04-23T15:24:10Z

contributors/design-proposals/hugepages.md

+Huge page support is needed for many large memory HPC workloads to achieve
+acceptable performance levels.
+
+This proposal is part of a larger effort to better support High Performance


HPC is too loaded of a term. It's really just performance sensitive workloads. JVMs with large heaps, stateful applications with large in-memory caches, even memcached, etc.

mpolednik · 2017-07-03T10:27:12Z

contributors/design-proposals/hugepages.md

+
+While a system may support multiple huge pages sizes, it is assumed that nodes
+configured with huge pages will only use one huge page size, namely the default
+page size in `cat /proc/meminfo | grep Hugepagesize`.  In Linux, this is 2MB


Why only a single pagesize per node? As far as I understand hugepages, the dTLB (on x86_64) is able to cache 2 MiB and 1 GiB pages separately on the L1. Given that is true, it is wasteful not to utilize both sizes per node. (It'd be interesting to study how the unified L2 dTLB is affected by mixed pages though.)

mpolednik · 2017-07-03T10:33:58Z

contributors/design-proposals/hugepages.md

+This proposal only includes pre-allocated huge pages configured on the node by
+the administrator at boot time or by manual dynamic allocation.  It does not
+discuss the kubelet attempting to allocate huge pages dynamically in an attempt
+to accommodate a scheduling pod or the use of Transparent Huge Pages (THP). THP


Would you expect the dynamic allocation to not happen at all or to be added as another proposal? Although not perfectly reliable due to memory fragmentation, it can still serve as a nice to have. The scheduler should prefer the nodes with preallocated pages available, but if there are none it could try to allocate pages on a node with low memory fragmentation.

Also, how are the hugepages going to be allocated? Is that outside of k8s' scope?

i provide a sample daemonset in #837 that can pre-allocate huge pages. If pods cannot schedule due to lack of available nodes with sufficient pre-allocated huge pages, something similar can run to allocate additional pages (or the daemonset configuration could be tweaked for a pool of nodes to increase the size). either way, that management piece is considered out of scope.

mpolednik · 2017-07-03T10:36:38Z

contributors/design-proposals/hugepages.md

+pages.  For this reason, some applications may be designed to (or recommend) use
+pre-allocated huge pages instead of THP.
+
+The proposal is also limited to x86_64 support where two huge page sizes are


Limiting the proposal to single arch is unnecessary as long as it's generic enough, which it is in this state.

Adding huge page volume plugin.

derekwaynecarr · 2017-07-24T02:15:22Z

closed in favor of #837

hugepage proposal

476bc3b

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 15, 2016

sjenning mentioned this pull request Dec 15, 2016

proposal: huge page support kubernetes/kubernetes#33601

Closed

philips reviewed Dec 20, 2016

View reviewed changes

jonboulle reviewed Dec 23, 2016

View reviewed changes

bgrant0607 assigned dchen1107 and vishh Jan 18, 2017

derekwaynecarr mentioned this pull request Apr 23, 2017

WIP - Enable scheduler and node isolation support for pre-allocated HugePages kubernetes/kubernetes#44817

Closed

derekwaynecarr reviewed Apr 23, 2017

View reviewed changes

derekwaynecarr mentioned this pull request Apr 25, 2017

Add support for pre-allocated hugepages kubernetes/enhancements#275

Closed

jeremyeder mentioned this pull request Jun 21, 2017

Huge pages volume plugin. kubernetes/kubernetes#47658

Closed

Adding huge page volume plugin

8e926fc

mpolednik reviewed Jul 3, 2017

View reviewed changes

PiotrProkop and others added 3 commits July 4, 2017 15:36

PageSize defaults to Hugepagesize rather than 2M

4c045e3

Changes due to @cmluciano review

551db8f

Merge pull request #764 from PiotrProkop/hugepages

00d7112

Adding huge page volume plugin.

derekwaynecarr closed this Jul 24, 2017

cblecker deleted the hugepage-proposal branch August 18, 2017 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hugepage proposal #181

hugepage proposal #181

sjenning commented Dec 15, 2016

philips Dec 20, 2016

sjenning Dec 21, 2016

thockin Dec 22, 2016

derekwaynecarr Apr 23, 2017

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

jonboulle Dec 23, 2016

derekwaynecarr Apr 23, 2017

mpolednik Jul 3, 2017

mpolednik Jul 3, 2017

mpolednik Jul 3, 2017

derekwaynecarr Jul 24, 2017

mpolednik Jul 3, 2017

derekwaynecarr commented Jul 24, 2017

hugepage proposal #181

hugepage proposal #181

Conversation

sjenning commented Dec 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Jul 24, 2017