promote ephemeral-storage-quotas to beta in 1.25 #2697

pacoxu · 2021-05-08T03:44:55Z

part of #1029

I'd like to follow up on this feature. In 1.22-1.24 cyle, I will benchmark it and then check with sig if we should promote it to beta then.

Action Items in 1.22~1.24

add some metrics or make more visibility add VolumeStatCalDuration metrics for fsquato monitoring benchmark kubernetes#107201
fix a bug: during kubelet restart. assignQuota checks if the underlying medium supports quotas and if so setting it kubernetes#107302
benchmark the feature ⬆️ the comments above.
add some warning log for long time cost volume calculation add warning log if volume calculation took too long than 1 second kubernetes#107490
check e2e testings: e2e evolution (LocalStorageCapacityIsolationQuotaMonitoring [Slow] [Serial] [Disruptive] [Feature:LocalStorageCapacityIsolationQuota][NodeFeature:LSCIQuotaMonitoring])
need some feedbacks
Ephemeral storage doesn't account for deleted files with open handles kubernetes#83107 （I tested it. It will work immediately if we enable this feature gate ）

More details can be found in #1029 (comment)

Action Items in 1.25

- promote it to beta promote LocalStorageCapacityIsolationFSQuotaMonitoring to beta kubernetes#107329

pacoxu · 2021-05-08T08:07:22Z

/assign @deads2k
for prod-readiness approval

pacoxu · 2021-05-10T10:35:16Z

@ehashman I opened the status update PR and will follow up on the actions in 1.22.

deads2k · 2021-05-11T14:26:08Z

You need to add and complete the Production Readiness Review Questionaire in your KEP.

See the template here: https://github.com/kubernetes/enhancements/blame/master/keps/NNNN-kep-template/README.md#L359-L667

pacoxu · 2021-05-12T11:33:37Z

Updated. @deads2k
As I have no enough time today, I may go through and make answers better later.

deads2k · 2021-05-12T15:01:53Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+Performance changes.


The performance is different compared to the first time it was enabled or is this generally true of enabling the feature?

Only for this scanario "use project quotas to monitor emptyDir volume storage consumption rather than filesystem walk for better performance and accuracy."

Only for this scanario "use project quotas to monitor emptyDir volume storage consumption rather than filesystem walk for better performance and accuracy."

You'll want to add this to the KEP.

deads2k · 2021-05-12T16:16:35Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes. Set the feature gate to false.


Will the volumes created while the feature gate was on still be enforcing quota.

If the pod was created with enforcing quota, disable the feature gate will not change the running pod.
The newly created pod will not use the quota.

deads2k · 2021-05-12T16:17:54Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+
+None.


I'm surprised by this. What happens if the kube-apiserver starts using the feature gate and the kubelets have not yet restarted. Is there any impact?

If I understand correctly, kube-apiserver feature gate setting will not make sense.

When you set the feature gate on the kubelet and restart it, the newly created pod will use the quota strategy.

The rollout/rollback will not impact running workloads.

deads2k · 2021-05-12T16:19:20Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+- [x] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: LocalStorageCapacityIsolationFSQuotaMonitoring
+  - Components depending on the feature gate: kubelet\apiserver\controller-manager\scheduler


I'm a little surprised the scheduler and controller-manager are involved. What do they enforce?

It is in the scheduler and controller feature-gate lists. However, it should only make sense for kubelet.
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/

LocalStorageCapacityIsolationFSQuotaMonitoring is only used in pkg/volume/util/fsquota/quota.go.

I update it to be kubelet only.

LocalStorageCapacityIsolation will affect scheduling.

deads2k · 2021-05-12T16:20:12Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+###### What specific metrics should inform a rollback?
+
+None. User can focus on volume metrics.


It seems like cluster-admins probably want to be able to tell when pods start failing because they run out of space.

To see failures, I think we can read kubelet log for eviction related logs or using xfs_quota to check the quota settings.

here's an example of volume1048581 (empty dir).

[root@daocloud ~]# xfs_quota -x -c 'report -h' /dev/sdc Project quota on /var/lib/kubelet (/dev/sdc) Blocks Project ID Used Soft Hard Warn/Grace ---------- --------------------------------- #0 156K 0 0 00 [------] volume1048577 16K 0 0 00 [------] volume1048578 8K 0 0 00 [------] volume1048579 8K 0 0 00 [------] volume1048581 0 8E 8E 00 [------] volume1048582 8K 0 0 00 [------] volume1048583 8K 0 0 00 [------] volume1048584 0 8E 8E 00 [------] [root@daocloud ~]# docker ps | grep test 2e848e2ae430 k8s.gcr.io/test-webserver "/test-webserver" 4 seconds ago Up 3 seconds k8s_test-container_test-pd-1_default_b6fcd777-ffef-4148-ba9a-fbdf4e7cfc8d_0 19281193d915 k8s.gcr.io/test-webserver "/test-webserver" 6 seconds ago Up 5 seconds k8s_test-container_test-pd-2_default_ce4c955b-a04a-4b98-8454-d904ba6341eb_0 57f96f8680eb k8s.gcr.io/pause:3.2 "/pause" 9 seconds ago Up 8 seconds k8s_POD_test-pd-2_default_ce4c955b-a04a-4b98-8454-d904ba6341eb_0 b14274ba66e8 k8s.gcr.io/pause:3.2 "/pause" 9 seconds ago Up 8 seconds k8s_POD_test-pd-1_default_b6fcd777-ffef-4148-ba9a-fbdf4e7cfc8d_0 [root@daocloud ~]# docker cp data 2e848e2ae430:/cache/data/ [root@daocloud ~]# xfs_quota -x -c 'report -h' /dev/sdc Project quota on /var/lib/kubelet (/dev/sdc) Blocks Project ID Used Soft Hard Warn/Grace ---------- --------------------------------- #0 160K 0 0 00 [------] volume1048577 16K 0 0 00 [------] volume1048578 8K 0 0 00 [------] volume1048579 8K 0 0 00 [------] volume1048580 0 8E 8E 00 [------] volume1048581 2.6G 8E 8E 00 [------] volume1048582 8K 0 0 00 [------] volume1048583 8K 0 0 00 [------] volume1048584 0 8E 8E 00 [------]

keps/sig-node/1029-ephemeral-storage-quotas/README.md

deads2k · 2021-05-12T16:22:59Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+### Monitoring Requirements
+
+* **How can an operator determine if the feature is in use by workloads?**
+  - A cluster-admin can set kubelet on each node. If the feature gate is disabled, workloads on that node will not use it.


This lets you know whether a kubelet supports it, but it's not obvious to me how a cluster-admin knows how close their pods are to the limit.

xfs_quota -x -c 'report -h' /dev/sdc to check the current usage.

Check spec.containers[].resources.limits.ephemeral-storage of each container.

Other ways to collect some more data:

docker ps -s to see container storage usage.

df -h on node

du container or pod dir

/stats/summary can show ephemeral storage usage: https://127.0.0.1:10250/stats/summary | grep ephemeral-storage -A 10

Actually, to me, this question is more about how an operator can tell across their fleet whether the feature is in use or not. I don't think it's critical for this feature; I think of this more as useful for allowing the operator to know what workloads would be impact by a rollout, or for figuring out if this feature could be the cause of higher error rates, for example.

That said, is there, say, a per-node metric saying which mechanism is in use? That could be useful; if you don't have it, perhaps add it to the "possible useful metrics" section below.

deads2k · 2021-05-12T16:23:45Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+* **What are the SLIs (Service Level Indicators) an operator can use to determine
+the health of the service?**
+  - Set a quota for the specified volume and try to write to the volume to check if there is a limitation.


I don't see how this scales for even a single cluster. If a cluster-admin is enforcing a limit, how do they know whether pods are being impacted by this limit?

I don't know a simple way(like kubectl top pod or docker stats to check the cpu/memory usage and limitation) to check that. The comment above is some methods that can help to know which pod will be impacted.

xfs_quota is the recommended way to say the usage

du is the traditional way

deads2k · 2021-05-12T16:24:37Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+  - N/A.
+
+* **Are there any missing metrics that would be useful to have to improve observability of this feature? **
+  - No.


I'm not seeing any cluster-admin visibility into the health of this feature.

kubelet metrics can show kubelet_evictions{eviction_signal="ephemeralpodfs.limit"} 1

I'm not sure if it helps.

deads2k · 2021-05-12T16:25:33Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+* **Will enabling / using this feature result in non-negligible increase of
+resource usage (CPU, RAM, disk, IO, ...) in any components?**
+  - No.


a previous answer was "Performance changes.". I'm not clear on how you can have no changes here if that's the case.

Careless here.

kubelet now allows use of XFS quotas (on XFS and suitably configured ext4fs filesystems) to monitor storage consumption for ephemeral storage (currently for emptydir volumes only). This method of monitoring consumption is faster and more accurate than the old method of walking the filesystem tree. It does not enforce limits, only monitors consumption. To utilize this functionality, you must set the feature gate LocalStorageCapacityIsolationFSQuotaMonitoring=true. For ext4fs filesystems, you must create the filesystem with mkfs.ext4 -O project <block_device>and runtune2fs -Q prjquota block device; XFS
filesystems need no additional preparation. The filesystem must be mounted with option project in
/etc/fstab. If your primary partition is the root filesystem, you must also add rootflags=pquota to your
GRUB config file.`
kubernetes/kubernetes#66928

Updated.

keps/sig-node/1029-ephemeral-storage-quotas/README.md

deads2k · 2021-05-12T16:26:41Z

Updated. @deads2k
As I have no enough time today, I may go through and make answers better later.

I'm happy to see improvement later, but I think greater granularity is required to meet merge requirements. I've left some detailed comments.

pacoxu · 2021-05-13T08:04:35Z

I replied to most of the questions and updated the KEP.

Updated. @deads2k
As I have no enough time today, I may go through and make answers better later.

I'm happy to see improvement later, but I think greater granularity is required to meet merge requirements. I've left some detailed comments.

Sorry, I mistakenly think yesterday is the deadline. 😂

pacoxu · 2022-05-30T02:43:43Z

/unhold

pacoxu · 2022-06-10T01:30:57Z

/cc @dchen1107

Can we add this to the v1.25 release cycle?

pacoxu · 2022-06-10T01:58:31Z

/cc @dchen1107
/uncc chendave

keps/sig-node/1029-ephemeral-storage-quotas/kep.yaml

keps/prod-readiness/sig-node/1029.yaml

keps/sig-node/1029-ephemeral-storage-quotas/README.md

johnbelamaric · 2022-06-18T00:15:09Z

keps/sig-node/1029-ephemeral-storage-quotas/README.md

+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+
+- Restart kubelet and wait for 1 minute to make the SLOs clear.(The volume stats checking interval is determined by kubelet flag `volumeStatsAggPeriod`(default 1m).)


I don't see how this helps figure out why the SLO isn't being met. Is the recommendation to disable the feature in this case?

I updated here to

If the metrics show problems, we can check the log and quota dir with the below commands. - There will be warning logs([after the # is merged](https://github.com/kubernetes/kubernetes/pull/107490)) if volume calculation took too long than 1 second - If quota is enabled, you can find the volume information and the processing time with `time repquota -P /var/lib/kubelet -s -v`

No, we recommend enabling this feature gate for performance.

I wanted to explain how to make the metric clear in the old message. Probably, this is not needed.

The metrics volume_metric_collection_duration_seconds is a histogram metric that contains a lot of old data.
A restart may help you to clearly know what is the current status of volume monitoring.

volumeStatsAggPeriod flag is the time that we need to wait before checking the metrics.

pacoxu · 2022-06-20T02:57:13Z

@johnbelamaric I updated the KEP according to your comments. Thanks for your detailed review.

johnbelamaric · 2022-06-21T00:07:30Z

Ok. Minor comment on PRR, but I think it's OK. But it needs SIG approval.

pacoxu · 2022-06-21T01:38:21Z

/assign @derekwaynecarr @dchen1107

johnbelamaric · 2022-06-21T22:03:59Z

Thanks, that looks good. Once there is SIG approval I will approve (I am root in this repo so I have to wait or it will approve the whole thing).

dchen1107 · 2022-06-22T22:10:07Z

/lgtm
/approve

johnbelamaric · 2022-06-22T22:21:17Z

/approve

k8s-ci-robot · 2022-06-22T22:21:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, johnbelamaric, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-node/OWNERS~~ [dchen1107,johnbelamaric]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 8, 2021

k8s-ci-robot requested review from dchen1107 and derekwaynecarr May 8, 2021 03:45

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 8, 2021

pacoxu force-pushed the ephemeral-storage-quotas-beta branch 2 times, most recently from 4653a2f to 11900a7 Compare May 8, 2021 04:49

k8s-ci-robot assigned deads2k May 8, 2021

pacoxu force-pushed the ephemeral-storage-quotas-beta branch from 11900a7 to f23fcb4 Compare May 12, 2021 11:32

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 12, 2021

deads2k reviewed May 12, 2021

View reviewed changes

keps/sig-node/1029-ephemeral-storage-quotas/README.md Outdated Show resolved Hide resolved

deads2k reviewed May 12, 2021

View reviewed changes

keps/sig-node/1029-ephemeral-storage-quotas/README.md Show resolved Hide resolved

pacoxu force-pushed the ephemeral-storage-quotas-beta branch from f23fcb4 to 2c431ce Compare May 13, 2021 08:02

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2022

k8s-ci-robot requested a review from chendave June 10, 2022 01:30

k8s-ci-robot removed the request for review from chendave June 10, 2022 01:58

dchen1107 added this to the v1.25 milestone Jun 14, 2022

johnbelamaric reviewed Jun 18, 2022

View reviewed changes

pacoxu requested a review from johnbelamaric June 20, 2022 02:56

quota-monitoring:update KEP following john's comments

67e3eaa

pacoxu force-pushed the ephemeral-storage-quotas-beta branch from ead5a8f to 67e3eaa Compare June 21, 2022 01:37

k8s-ci-robot assigned dchen1107 and derekwaynecarr Jun 21, 2022

ephemeral storage quotas: add test updates

45a80c4

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 22, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2022

k8s-ci-robot merged commit c965fb4 into kubernetes:master Jun 22, 2022

pacoxu mentioned this pull request Jun 23, 2022

promote LocalStorageCapacityIsolationFSQuotaMonitoring to beta kubernetes/kubernetes#107329

Merged

7 tasks

pacoxu mentioned this pull request Aug 3, 2022

promote LocalStorageCapacityIsolationFSQuotaMonitoring to beta kubernetes/website#35653

Merged

pacoxu mentioned this pull request Sep 21, 2022

[pending UserNamespace beta]promote LocalStorageCapacityIsolationFSQuotaMonitoring to beta kubernetes/kubernetes#112626

Closed

11 tasks

pacoxu mentioned this pull request Feb 2, 2023

[v1.29] ephemeral-storage-quotas: repromote to beta #3821

Merged


		###### What happens if we reenable the feature if it was previously rolled back?

		Performance changes.


		###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

		Yes. Set the feature gate to false.


		###### How can a rollout or rollback fail? Can it impact already running workloads?

		None.


		###### What specific metrics should inform a rollback?

		None. User can focus on volume metrics.


		###### What steps should be taken if SLOs are not being met to determine the problem?

		- Restart kubelet and wait for 1 minute to make the SLOs clear.(The volume stats checking interval is determined by kubelet flag `volumeStatsAggPeriod`(default 1m).)

promote ephemeral-storage-quotas to beta in 1.25 #2697

promote ephemeral-storage-quotas to beta in 1.25 #2697

Conversation

pacoxu commented May 8, 2021 • edited Loading

Action Items in 1.22~1.24

Action Items in 1.25

pacoxu commented May 8, 2021

pacoxu commented May 10, 2021

deads2k commented May 11, 2021

pacoxu commented May 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacoxu May 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacoxu May 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented May 12, 2021

pacoxu commented May 13, 2021 • edited Loading

pacoxu commented May 30, 2022

pacoxu commented Jun 10, 2022 • edited Loading

pacoxu commented Jun 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacoxu commented Jun 20, 2022

johnbelamaric commented Jun 21, 2022

pacoxu commented Jun 21, 2022

johnbelamaric commented Jun 21, 2022

dchen1107 commented Jun 22, 2022

johnbelamaric commented Jun 22, 2022

k8s-ci-robot commented Jun 22, 2022

pacoxu commented May 8, 2021 •

edited

Loading

pacoxu May 13, 2021 •

edited

Loading

pacoxu May 13, 2021 •

edited

Loading

pacoxu commented May 13, 2021 •

edited

Loading

pacoxu commented Jun 10, 2022 •

edited

Loading