Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for container fs and image fs to be on the same drive but in a different partition #128344

Merged

Conversation

kannon92
Copy link
Contributor

@kannon92 kannon92 commented Oct 25, 2024

Reintroduce #126562 (comment) and fix a bug where when split filesystem is not detected we were ending the eviction manager early and not setting rank functions.

Fixed an issue in the kubelet that showed when writeable layers and read-only layers were at different paths within the same mount.
Kubernetes was previously detecting that the image filesystem was split, even when that was not really the case

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 25, 2024
@kannon92 kannon92 force-pushed the revert-128333-eviction-manager branch from 943caa9 to 742ca47 Compare October 25, 2024 22:52
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 25, 2024
@kannon92
Copy link
Contributor Author

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 25, 2024
@kannon92
Copy link
Contributor Author

/test

@k8s-ci-robot
Copy link
Contributor

@kannon92: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cos-containerd-e2e-ubuntu-gce
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-pull-through-cache
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary
  • /test pull-kubernetes-verify-lint

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-crio-cgroupv1-node-e2e-eviction
  • /test pull-crio-cgroupv1-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-features
  • /test pull-crio-cgroupv1-node-e2e-features-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-hugepages
  • /test pull-crio-cgroupv1-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-resource-managers
  • /test pull-crio-cgroupv1-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-imagefs-separatedisktest
  • /test pull-crio-cgroupv2-imagefs-separatedisktest-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-eviction
  • /test pull-crio-cgroupv2-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-hugepages
  • /test pull-crio-cgroupv2-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-resource-managers
  • /test pull-crio-cgroupv2-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-splitfs-separate-disk
  • /test pull-crio-cgroupv2-splitfs-separate-disk-kubetest2
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-e2e-gci-gce-alpha-enabled-default
  • /test pull-kubernetes-apidiff
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2-kubetest2
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-master-windows-nodelogquery
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-master
  • /test pull-kubernetes-e2e-capz-windows-serial-slow
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-ec2
  • /test pull-kubernetes-e2e-ec2-arm64
  • /test pull-kubernetes-e2e-ec2-conformance
  • /test pull-kubernetes-e2e-ec2-conformance-arm64
  • /test pull-kubernetes-e2e-ec2-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-disruptive-canary
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-policies
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-serial-canary
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-selinux
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-gci-gce-nftables
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-beta-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-evented-pleg
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kind-nftables
  • /test pull-kubernetes-e2e-relaxed-environment-variable-validation
  • /test pull-kubernetes-e2e-storage-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-dra-all
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-linter-hints
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-alpha-ec2
  • /test pull-kubernetes-node-e2e-containerd-1-7-dra
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-cri-proxy-serial
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra-kubetest2
  • /test pull-kubernetes-node-e2e-resource-health-status
  • /test pull-kubernetes-node-kubelet-containerd-flaky
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-podresize
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-swap-conformance-fedora-serial
  • /test pull-kubernetes-node-swap-conformance-ubuntu-serial
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-scheduler-perf
  • /test pull-kubernetes-unit-experimental
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-ec2
  • pull-kubernetes-e2e-ec2-conformance
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-gce-canary
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-linter-hints
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify
  • pull-kubernetes-verify-lint

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kannon92
Copy link
Contributor Author

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-crio-cgroupv2-node-e2e-eviction
/test pull-crio-cgroupv2-splitfs-separate-disk

@kannon92
Copy link
Contributor Author

/cc @AnishShah

@AnishShah
Copy link
Contributor

AnishShah commented Oct 26, 2024

/lgtm

ref: #126559

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1c23910a66b96dc412d6d8e028b61b6b16d89d0c

@kannon92
Copy link
Contributor Author

Eviction looks are much better with this.

/assign @SergeyKanzhelev

@AnishShah
Copy link
Contributor

/assign @yujuhong

@kannon92
Copy link
Contributor Author

/priority important-longterm
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 30, 2024
splitContainerImageFs := m.containerGC.IsContainerFsSeparateFromImageFs(ctx)
splitContainerImageFs, splitErr := diskInfoProvider.HasDedicatedContainerFs(ctx)
if splitErr != nil {
klog.ErrorS(splitErr, "eviction manager: failed to check if we have separate container filesystem. Ignoring.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact of ignoring the error and assuming container and image filesystem are shared?
(in line 258 we error out when "HasDedicatedImageFs" returns an error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first try of this PR returned an error if we couldn't check split fs. This lead to us not setting any eviction signals so we were hitting disk pressure limits and we would never trigger any eviction signals.

We set the signals only once and we re use those. Since we return once, we skip setting signals.

In case of container and image fs are shared, we don't really even need the containerfs signal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there's an error, we assume shared container and image filesystem. This is not great but at least we'll continue the evaluation and try to evict the pods if necessary. Is that right?

With the same reasoning, why shouldn't we ignore the imageFsErr? Just trying to understand if I missed something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but that has been that way forever. So I thought against changing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, your thought process is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not sure in what situations we'd have permanent errors vs transient ones. If errors are usually transient, returning it and retry next time seems reasonable. If they could be permanent, continue and do best-effort eviction seems more desirable. In that case, maybe kubelet should send an event to surface the issue.

Have we seen any errors being reported in real life?

We could at least document why we ignore the error and add a TODO to revisit the error handling of both later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For imageFS I am not really sure of the error path.

SplitFS we add a label to cadvisor to detect the split filesystem only if the system is split. In this case we had an error and then we dropped eviction signals entirely.

If errors are usually transient, returning it and retry next time seems reasonable. If they could be permanent, continue and do best-effort eviction seems more desirable.

We only set eviction signals once on the sync loop of the eviction manager. There is a nill check for some flags around imagefs and if that is nil then we check for dedicatedImageFs. Once that value gets set that code is skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. This PR wasn't really about fixing the imagefs path so I left it alone for now.

@sftim
Copy link
Contributor

sftim commented Oct 31, 2024

I don't understand what the changelog is saying about drives and partitions. "Drive" is a very dated term when it comes to storage, except on Windows.

@kannon92
Copy link
Contributor Author

I don't understand what the changelog is saying about drives and partitions. "Drive" is a very dated term when it comes to storage, except on Windows.

I went ahead and used the release note from the first try of this PR.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2024
@yujuhong
Copy link
Contributor

yujuhong commented Nov 1, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cf760325e714513f4e5f1b786b6b4a2c406a2162

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kannon92, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 1, 2024
@kannon92
Copy link
Contributor Author

kannon92 commented Nov 1, 2024

/retest

@k8s-ci-robot
Copy link
Contributor

@kannon92: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-crio-cgroupv2-node-e2e-eviction 742ca47 link false /test pull-crio-cgroupv2-node-e2e-eviction
pull-kubernetes-e2e-gce 48dc7d3 link unknown /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kannon92
Copy link
Contributor Author

kannon92 commented Nov 1, 2024

/retest

@k8s-ci-robot k8s-ci-robot merged commit b845968 into kubernetes:master Nov 1, 2024
11 of 14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

6 participants