-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promote ephemeral-storage-quotas to beta in 1.25 #2697
promote ephemeral-storage-quotas to beta in 1.25 #2697
Conversation
4653a2f
to
11900a7
Compare
/assign @deads2k |
@ehashman I opened the status update PR and will follow up on the actions in 1.22. |
You need to add and complete the Production Readiness Review Questionaire in your KEP. See the template here: https://github.com/kubernetes/enhancements/blame/master/keps/NNNN-kep-template/README.md#L359-L667 |
11900a7
to
f23fcb4
Compare
Updated. @deads2k |
|
||
###### What happens if we reenable the feature if it was previously rolled back? | ||
|
||
Performance changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance is different compared to the first time it was enabled or is this generally true of enabling the feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for this scanario "use project quotas to monitor emptyDir volume storage consumption rather than filesystem walk for better performance and accuracy."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for this scanario "use project quotas to monitor emptyDir volume storage consumption rather than filesystem walk for better performance and accuracy."
You'll want to add this to the KEP.
|
||
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | ||
|
||
Yes. Set the feature gate to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the volumes created while the feature gate was on still be enforcing quota.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the pod was created with enforcing quota, disable the feature gate will not change the running pod.
The newly created pod will not use the quota.
|
||
###### How can a rollout or rollback fail? Can it impact already running workloads? | ||
|
||
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised by this. What happens if the kube-apiserver
starts using the feature gate and the kubelets have not yet restarted. Is there any impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, kube-apiserver feature gate setting will not make sense.
When you set the feature gate on the kubelet and restart it, the newly created pod will use the quota strategy.
The rollout/rollback will not impact running workloads.
|
||
- [x] Feature gate (also fill in values in `kep.yaml`) | ||
- Feature gate name: LocalStorageCapacityIsolationFSQuotaMonitoring | ||
- Components depending on the feature gate: kubelet\apiserver\controller-manager\scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised the scheduler and controller-manager are involved. What do they enforce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in the scheduler and controller feature-gate lists. However, it should only make sense for kubelet.
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
LocalStorageCapacityIsolationFSQuotaMonitoring is only used in pkg/volume/util/fsquota/quota.go
.
I update it to be kubelet only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalStorageCapacityIsolation will affect scheduling.
|
||
###### What specific metrics should inform a rollback? | ||
|
||
None. User can focus on volume metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like cluster-admins probably want to be able to tell when pods start failing because they run out of space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To see failures, I think we can read kubelet log for eviction related logs or using xfs_quota
to check the quota settings.
here's an example of volume1048581 (empty dir).
[root@daocloud ~]# xfs_quota -x -c 'report -h' /dev/sdc
Project quota on /var/lib/kubelet (/dev/sdc)
Blocks
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
#0 156K 0 0 00 [------]
volume1048577 16K 0 0 00 [------]
volume1048578 8K 0 0 00 [------]
volume1048579 8K 0 0 00 [------]
volume1048581 0 8E 8E 00 [------]
volume1048582 8K 0 0 00 [------]
volume1048583 8K 0 0 00 [------]
volume1048584 0 8E 8E 00 [------]
[root@daocloud ~]# docker ps | grep test
2e848e2ae430 k8s.gcr.io/test-webserver "/test-webserver" 4 seconds ago Up 3 seconds k8s_test-container_test-pd-1_default_b6fcd777-ffef-4148-ba9a-fbdf4e7cfc8d_0
19281193d915 k8s.gcr.io/test-webserver "/test-webserver" 6 seconds ago Up 5 seconds k8s_test-container_test-pd-2_default_ce4c955b-a04a-4b98-8454-d904ba6341eb_0
57f96f8680eb k8s.gcr.io/pause:3.2 "/pause" 9 seconds ago Up 8 seconds k8s_POD_test-pd-2_default_ce4c955b-a04a-4b98-8454-d904ba6341eb_0
b14274ba66e8 k8s.gcr.io/pause:3.2 "/pause" 9 seconds ago Up 8 seconds k8s_POD_test-pd-1_default_b6fcd777-ffef-4148-ba9a-fbdf4e7cfc8d_0
[root@daocloud ~]# docker cp data 2e848e2ae430:/cache/data/
[root@daocloud ~]# xfs_quota -x -c 'report -h' /dev/sdc
Project quota on /var/lib/kubelet (/dev/sdc)
Blocks
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
#0 160K 0 0 00 [------]
volume1048577 16K 0 0 00 [------]
volume1048578 8K 0 0 00 [------]
volume1048579 8K 0 0 00 [------]
volume1048580 0 8E 8E 00 [------]
volume1048581 2.6G 8E 8E 00 [------]
volume1048582 8K 0 0 00 [------]
volume1048583 8K 0 0 00 [------]
volume1048584 0 8E 8E 00 [------]
### Monitoring Requirements | ||
|
||
* **How can an operator determine if the feature is in use by workloads?** | ||
- A cluster-admin can set kubelet on each node. If the feature gate is disabled, workloads on that node will not use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lets you know whether a kubelet supports it, but it's not obvious to me how a cluster-admin knows how close their pods are to the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xfs_quota -x -c 'report -h' /dev/sdc
to check the current usage.- Check
spec.containers[].resources.limits.ephemeral-storage
of each container.
Other ways to collect some more data:
docker ps -s
to see container storage usage.df -h
on nodedu
container or pod dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/stats/summary can show ephemeral storage usage: https://127.0.0.1:10250/stats/summary | grep ephemeral-storage -A 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, to me, this question is more about how an operator can tell across their fleet whether the feature is in use or not. I don't think it's critical for this feature; I think of this more as useful for allowing the operator to know what workloads would be impact by a rollout, or for figuring out if this feature could be the cause of higher error rates, for example.
That said, is there, say, a per-node metric saying which mechanism is in use? That could be useful; if you don't have it, perhaps add it to the "possible useful metrics" section below.
|
||
* **What are the SLIs (Service Level Indicators) an operator can use to determine | ||
the health of the service?** | ||
- Set a quota for the specified volume and try to write to the volume to check if there is a limitation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this scales for even a single cluster. If a cluster-admin is enforcing a limit, how do they know whether pods are being impacted by this limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know a simple way(like kubectl top pod
or docker stats
to check the cpu/memory usage and limitation) to check that. The comment above is some methods that can help to know which pod will be impacted.
xfs_quota
is the recommended way to say the usagedu
is the traditional way
- N/A. | ||
|
||
* **Are there any missing metrics that would be useful to have to improve observability of this feature? ** | ||
- No. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing any cluster-admin visibility into the health of this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubelet metrics can show kubelet_evictions{eviction_signal="ephemeralpodfs.limit"} 1
I'm not sure if it helps.
|
||
* **Will enabling / using this feature result in non-negligible increase of | ||
resource usage (CPU, RAM, disk, IO, ...) in any components?** | ||
- No. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a previous answer was "Performance changes.". I'm not clear on how you can have no changes here if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careless here.
kubelet now allows use of XFS quotas (on XFS and suitably configured ext4fs filesystems) to monitor storage consumption for ephemeral storage (currently for emptydir volumes only). This method of monitoring consumption is faster and more accurate than the old method of walking the filesystem tree. It does not enforce limits, only monitors consumption. To utilize this functionality, you must set the feature gate
LocalStorageCapacityIsolationFSQuotaMonitoring=true. For ext4fs filesystems, you must create the filesystem with
mkfs.ext4 -O project <block_device>and run
tune2fs -Q prjquotablock device
; XFS
filesystems need no additional preparation. The filesystem must be mounted with optionproject
in
/etc/fstab
. If your primary partition is the root filesystem, you must also addrootflags=pquota
to your
GRUB config file.`
kubernetes/kubernetes#66928
Updated.
I'm happy to see improvement later, but I think greater granularity is required to meet merge requirements. I've left some detailed comments. |
f23fcb4
to
2c431ce
Compare
I replied to most of the questions and updated the KEP.
Sorry, I mistakenly think yesterday is the deadline. 😂 |
/unhold |
/cc @dchen1107 Can we add this to the v1.25 release cycle? |
/cc @dchen1107 |
|
||
###### What steps should be taken if SLOs are not being met to determine the problem? | ||
|
||
- Restart kubelet and wait for 1 minute to make the SLOs clear.(The volume stats checking interval is determined by kubelet flag `volumeStatsAggPeriod`(default 1m).) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this helps figure out why the SLO isn't being met. Is the recommendation to disable the feature in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated here to
If the metrics show problems, we can check the log and quota dir with the below commands.
- There will be warning logs([after the # is merged](https://github.com/kubernetes/kubernetes/pull/107490)) if volume calculation took too long than 1 second
- If quota is enabled, you can find the volume information and the processing time with `time repquota -P /var/lib/kubelet -s -v`
No, we recommend enabling this feature gate for performance.
I wanted to explain how to make the metric clear in the old message. Probably, this is not needed.
- The metrics
volume_metric_collection_duration_seconds
is a histogram metric that contains a lot of old data.
A restart may help you to clearly know what is the current status of volume monitoring. volumeStatsAggPeriod
flag is the time that we need to wait before checking the metrics.
@johnbelamaric I updated the KEP according to your comments. Thanks for your detailed review. |
Ok. Minor comment on PRR, but I think it's OK. But it needs SIG approval. |
ead5a8f
to
67e3eaa
Compare
/assign @derekwaynecarr @dchen1107 |
Thanks, that looks good. Once there is SIG approval I will approve (I am root in this repo so I have to wait or it will approve the whole thing). |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, johnbelamaric, pacoxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
part of #1029
#1029 (comment)
I'd like to follow up on this feature. In 1.22-1.24 cyle, I will benchmark it and then check with sig if we should promote it to beta then.
Action Items in 1.22~1.24
More details can be found in #1029 (comment)
Action Items in 1.25