Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-127: user namespace support for stateless pods #116377

Merged
merged 5 commits into from
Mar 14, 2023

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Mar 8, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

it adds support for user namespace for stateless pods

Which issue(s) this PR fixes:

Special notes for your reviewer:

rebased on top of: #116249

Does this PR introduce a user-facing change?

By enabling the `UserNamespacesStatelessPodsSupport` feature gate in kubelet, you can now run a stateless pod in a separate user namespace

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@giuseppe
Copy link
Member Author

giuseppe commented Mar 8, 2023

/sig node

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/node Categorizes an issue or PR as relevant to SIG Node. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/kubelet labels Mar 8, 2023
@k8s-ci-robot k8s-ci-robot requested review from humblec and mrunalp March 8, 2023 15:17
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 8, 2023
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sftim
Copy link
Contributor

sftim commented Mar 8, 2023

❓ do we publish the UID range that a Pod is using? eg: add that information into the .status of the Pod?

@giuseppe
Copy link
Member Author

giuseppe commented Mar 8, 2023

question do we publish the UID range that a Pod is using? eg: add that information into the .status of the Pod?

no, that information is not exposed at the moment

@pacoxu
Copy link
Member

pacoxu commented Mar 13, 2023

/priority important-soon
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 13, 2023
giuseppe and others added 5 commits March 13, 2023 22:21
add the definitions for the ID mappings to use at runtime for the
volume mount.  This is supported only on Linux where idmapped mounts
are used to perform the runtime mapping.

The new fields are mapped directly to the field in the OCI runtime
specs:

https://github.com/opencontainers/runtime-spec/blob/main/config.md#posix-platform-mounts

The CRI runtime will pass the mappings to the OCI runtime as-is.

Related to KEP-127.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Now KEP-127 relies on idmap mounts to do the ID translation and we won't
do any chowns in the kubelet.

This patch just removes the usage of GetHostIDsForPod() in
operationexecutor to do the chown, and also removes the
GetHostIDsForPod() method from the kubelet volume interface.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Latest changes to KEP-127 removed that phase, so let's stop reserving
those IDs for that.

While we are there, we replace 0 for 0*65536 as before we had a bug that
we were not multiplying the index, to avoid bugs in the future.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
To that end, we need to add one kubelet getter listPodsFromDisk(). Other
than that, it is a pretty trivial move.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Copy link
Contributor

@mrunalp mrunalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f5d61df8aeccd78ecd8e9b66f0d063b71702228f

@mrunalp
Copy link
Contributor

mrunalp commented Mar 14, 2023

@thockin @gnufied ptal

@rata
Copy link
Member

rata commented Mar 14, 2023

/retest

@gnufied
Copy link
Member

gnufied commented Mar 14, 2023

cc @dobsonj

return volumetypes.NewOperationContext(eventErr, detailedErr, migrated)
}
}

// Execute mount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea here is to, instead of kubelet giving permissions to mapped uid/gid, we are going to ask CRI to idmap these mount points when creating bind mounts? So no change in volumemanager code is necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly, the idea is to use idmapped mounts (through the CRI) to cancel the effect on volumes of running inside a user namespace

@gnufied
Copy link
Member

gnufied commented Mar 14, 2023

I think change is fine for alpha implementation with understanding that current limitation of idmapped mounts will prevent this from working in case of brown field volumes with non-uniform uid/gid permissions. But we are currently not targeting those anyways.

Lets merge this, but I also want to hear from @dobsonj who has been tinkering in this area.

/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, gnufied, mrunalp, rata

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2023
@dobsonj
Copy link
Member

dobsonj commented Mar 14, 2023

I think this looks fine, I'd just ask that we document which container runtimes support this so it's clear for users of this feature.
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2023
@k8s-ci-robot k8s-ci-robot merged commit 6a111be into kubernetes:master Mar 14, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

8 participants