Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do hostNet Pod-ports -> hostPorts in Pod defaults #117696

Conversation

thockin
Copy link
Member

@thockin thockin commented Apr 30, 2023

Rather than doing it in PodSpec defaulting, which triggers in Deployments and DaemonSets, do it only when a Pod is actually in play.

/kind bug
/kind cleanup
/kind api-change

partial for #117689

Pods which set `hostNetwork: true` and declare ports get the `hostPort` field set automatically.  Previously this would happen in the PodTemplate of a Deployment, DaemonSet or other workload API.  Now `hostPort` will only be set when an actual Pod is being created.  If this presents a problem, setting the feature gate "DefaultHostNetworkHostPortsInPodTemplates" to true will revert this behavior.  Please file a kubernetes bug if you need to do this.

EDIT by @dims: Fixed typo - feature name is DefaultHostNetworkHostPortsInPodTemplates (not DefaultHostNetworkHostPortsInWorkloads)

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 30, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Apr 30, 2023
@thockin thockin changed the title Do hostNet Pod-ports -> hosPorts in Pod defaults Do hostNet Pod-ports -> hostPorts in Pod defaults Apr 30, 2023
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 30, 2023
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2023
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@thockin
Copy link
Member Author

thockin commented May 1, 2023

/retest

@thockin thockin force-pushed the hostnet_hostport_default_pod_not_podspec branch from 3d61e01 to 665b731 Compare May 1, 2023 05:02
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 1, 2023
@thockin
Copy link
Member Author

thockin commented May 1, 2023

I thought for sure this would exploide somewhere subtle. It didn't. I added tests and a relnote. This is the part where @liggitt sees some subtle reason why I am a bonehead and can't do this.

Comment on lines 214 to 217
if obj.HostNetwork {
defaultHostNetworkPorts(&obj.Containers)
defaultHostNetworkPorts(&obj.InitContainers)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an instance of the problems with "derived defaults" described in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#static-defaults:

In some cases, these values may be conditional on or deterministically derived from other fields (e.g. "if otherField is X then this field defaults to 0" or "this field defaults to the value of otherField"). Note that such derived defaults present a hazard in the face of updates - if the "other" field changes, the derived field may have to change, too. The static defaulting logic is unaware of updates and has no concept of "previous value", which means this inter-field relationship becomes the user's problem - they must update both the field they care about and the "other" field.

so changing it (by moving the defaulting to the pod, which is immutable on update) seems useful if we can swing it

Copy link
Member

@liggitt liggitt May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediately mechanical issue I see is that it looks like validation requires hostPort == containerPort when hostNetwork == true, even at the deployment, etc, level:

  • ValidatePodTemplateSpec → ValidatePodSpec → ValidatePodSecurityContext → validateHostNetwork:
    func validateHostNetwork(hostNetwork bool, containers []core.Container, fldPath *field.Path) field.ErrorList {
    allErrors := field.ErrorList{}
    if hostNetwork {
    for i, container := range containers {
    portsPath := fldPath.Index(i).Child("ports")
    for i, port := range container.Ports {
    idxPath := portsPath.Index(i)
    if port.HostPort != port.ContainerPort {
    allErrors = append(allErrors, field.Invalid(idxPath.Child("containerPort"), port.ContainerPort, "must match `hostPort` when `hostNetwork` is true"))
  • How is this validation passing on deployments that set hostNetwork: true but rely on defaulting to populate hostPort in all the container ports? Do we not have tests for that?

More esoteric implications of this:

  1. a client that sends Update requests for deployments, etc, will no longer get hostPort fields defaulted inside containers. That means an update to an object persisted in a previous release you would expect to be a no-op will now trigger a generation change and rollout, even though the resulting pods would be identical.
  2. pod-creating controllers (replicaset, job, cronjob, statefulset, daemonset) have to tolerate defaulting differences between the pod-spawning object and the pods without triggering rollouts to try to make the pods exactly match the workload spec. I think that already has to be generally true since we already have pod-specific defaulting, but I've been surprised before, which led to Add tests to detect default changes to podspec and podspectemplate defaults #78914, which I don't think we've ever shored up this aspect to verify controllers are well-behaved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like validation requires hostPort == containerPort when hostNetwork == true

And it seems there are no tests for this case. I can fix that.

a client that sends Update requests for deployments, etc, will no longer get hostPort fields defaulted inside containers. That means an update to an object persisted in a previous release you would expect to be a no-op will now trigger a generation change and rollout, even though the resulting pods would be identical.

Not sure I follow this one. A Deployment which already exists has these fields set. We won't unset them.

have to tolerate defaulting differences between the pod-spawning object and the pods without triggering rollouts to try to make the pods exactly match the workload spec

I'll look into this, too. I kicked this PR off to see if it was easy enough to squeeze into free time. It may not be :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a client that sends Update requests for deployments, etc, will no longer get hostPort fields defaulted inside containers. That means an update to an object persisted in a previous release you would expect to be a no-op will now trigger a generation change and rollout, even though the resulting pods would be identical.

Not sure I follow this one. A Deployment which already exists has these fields set. We won't unset them.

When the client sends an update of {kind:Deployment, spec: {template: {hostNetwork: true, ...}}}, if we no longer default hostPort inside all the container ports, the update would not have those fields populated in the new object on an update. DeepEqual(newDeployment.spec.template, oldDeployment.spec.template) would show a diff, generation would get bumped, a new replicaset/etc would roll out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the update would not have those fields populated in the new object on an update

If they did a full replace (PUT) then yes, you are right. But I am not sure on PATCH - we're into the area of patch/apply where I get fuzzy.

If user sent an update which changed (e.g.) Deployment labels (which should not trigger a redeploy), wouldn't the patch/apply logic carry the hostPort values forward?

Copy link
Member

@liggitt liggitt May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a patch not touching .spec.template... should leave the defaulted fields alone

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed with validation tests. Will need to look at the rest later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that already has to be generally true since we already have pod-specific defaulting, but I've been surprised before, which led to #78914, which I don't think we've ever shored up this aspect to verify controllers are well-behaved.

Yeah That test doesn't handle conditional defaults. I'll have to comprehend the tangle behind that thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof. I think I get it, but it's tricky - it detects a slice with a value as "already defaulted". I'm pushing an update now - PTAL. Still need to verify manually that controllers don't hotloop on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified by inspection DaemonSet, ReplicaSet, StatefulSet, and Job. I also ran a Deployment under this config and verified that Deployment's hostPort is not set, the pod's hostPort is set, and pods were not being recreated.

@liggitt
Copy link
Member

liggitt commented May 1, 2023

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 1, 2023
@aojea
Copy link
Member

aojea commented May 1, 2023

sees some subtle reason why I am a bonehead and can't do this.

I was 🍿 👀 ... and he never let you down 😅

@thockin thockin force-pushed the hostnet_hostport_default_pod_not_podspec branch from 665b731 to 29e47ee Compare May 1, 2023 20:02
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 1, 2023
@thockin thockin force-pushed the hostnet_hostport_default_pod_not_podspec branch from 29e47ee to c3a8968 Compare May 1, 2023 21:07
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2023
Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

/hold
logic looks correct, two small comments

pkg/features/kube_features.go Outdated Show resolved Hide resolved
Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

/hold
logic looks correct, two small comments

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 3f76fa814f7556da031e9f877ea15edc8906d1e0

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

thockin added 2 commits May 9, 2023 18:10
Rather than doing it in PodSpec defaulting, which triggers in
Deployments and DaemonSets, do it only when a Pod is actually in play.
This will ensure that HostPort == ContainerPort for pods and that
HostPort == 0 || HostPort == ContainerPort for embedded PodSpecs.
@thockin
Copy link
Member Author

thockin commented May 10, 2023

re-pushed

@thockin thockin force-pushed the hostnet_hostport_default_pod_not_podspec branch from e67ca51 to 4bbf611 Compare May 10, 2023 01:10
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2023
@k8s-ci-robot k8s-ci-robot requested a review from liggitt May 10, 2023 01:10
@liggitt
Copy link
Member

liggitt commented May 10, 2023

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 10, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 486110e7625dc93803c3b257ccef1b9966039749

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@@ -212,6 +212,15 @@ const (
// Enables support for time zones in CronJobs.
CronJobTimeZone featuregate.Feature = "CronJobTimeZone"

// owner: @thockin
// deprecated: v1.28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is much better than current method we were doing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't realize this was an option either :)

@akhilerm
Copy link
Member

akhilerm commented Nov 7, 2024

Is the issue #126879 caused by this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants