Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kcm should use the controller name constants #127550

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

carlory
Copy link
Member

@carlory carlory commented Sep 23, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

A follow-up of #120371

  1. logging should use a canonical controller name when referencing a controller (Eg. Starting X, Shutting down X)
  2. calling WaitForNamedCacheSync

Which issue(s) this PR fixes:

Related-to #120371

Special notes for your reviewer:

Does this PR introduce a user-facing change?

All controllers in kcm used the controller name constants when
1. logging should use a canonical controller name when referencing a controller (Eg. Starting X, Shutting down X)
2. calling WaitForNamedCacheSync

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2024
@carlory
Copy link
Member Author

carlory commented Sep 23, 2024

/cc @atiratree

@@ -310,7 +312,7 @@ func (c *Controller) sync(ctx context.Context, key string) error {
WithMessage("There are still IPAddresses referencing the ServiceCIDR, please remove them or create a new ServiceCIDR").
WithLastTransitionTime(metav1.Now()))
svcApply := networkingapiv1beta1apply.ServiceCIDR(cidr.Name).WithStatus(svcApplyStatus)
_, err = c.client.NetworkingV1beta1().ServiceCIDRs().ApplyStatus(ctx, svcApply, metav1.ApplyOptions{FieldManager: controllerName, Force: true})
_, err = c.client.NetworkingV1beta1().ServiceCIDRs().ApplyStatus(ctx, svcApply, metav1.ApplyOptions{FieldManager: c.name, Force: true})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlory
Copy link
Member Author

carlory commented Sep 23, 2024

/retest

@carlory
Copy link
Member Author

carlory commented Sep 23, 2024

/cc @mborsz

For endpointslice part

@k8s-ci-robot k8s-ci-robot requested a review from mborsz September 23, 2024 06:33
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 29, 2024
@k8s-ci-robot k8s-ci-robot added wg/device-management Categorizes an issue or PR as relevant to WG Device Management. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 30, 2024
1. logging should use a canonical controller name when referencing a controller (Eg. Starting X, Shutting down X)
2. calling WaitForNamedCacheSync
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2024
@carlory
Copy link
Member Author

carlory commented Oct 31, 2024

/test pull-kubernetes-e2e-kind-ipv6

@atiratree
Copy link
Member

/lgtm
/approve

cc @liggitt @thockin for the final review/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: atiratree, carlory
Once this PR has been reviewed and has the lgtm label, please assign mikedanese for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -43,6 +43,7 @@ func newDaemonSetControllerDescriptor() *ControllerDescriptor {
func startDaemonSetController(ctx context.Context, controllerContext ControllerContext, controllerName string) (controller.Interface, bool, error) {
dsc, err := daemon.NewDaemonSetsController(
ctx,
controllerName,
Copy link
Contributor

@pohly pohly Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary because each controller gets instantiated with a context where the controller name is already included via WithName.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm cancel

Copy link
Member Author

@carlory carlory Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// nameKey is used to log the `WithName` values as an additional attribute.
nameKey = "logger"

The key is different when printing the controller name.

	logger := klog.FromContext(ctx)
	logger.Info("Starting daemon sets controller")
	defer logger.Info("Shutting down daemon sets controller")

When I look at the code, I see that there're some comments.

// In StartController, an explicit "controller" key is used instead, for two reasons:
// - while contextual logging is alpha, klog.LoggerWithName is still a no-op,
//   so we cannot rely on it yet to add the name
// - it allows distinguishing between log entries emitted by the controller
//   and those emitted for it - this is a bit debatable and could be revised.

@pohly What's current state of the contextual logging feature? And What's the desired key name for controller name, logger or controller?

Where should other todos get the controller name?

//  2. used anywhere inside the controller itself:
//     2.1. [TODO] logging should use a canonical controller name when referencing a controller (Eg. Starting X, Shutting down X)
//     2.2. [TODO] emitted events should have an EventSource.Component set to the controller name (usually when initializing an EventRecorder)
//     2.3. [TODO] registering ControllerManagerMetrics with ControllerStarted and ControllerStopped
//     2.4. [TODO] calling WaitForNamedCacheSync
  • 2.2 may need to move initializing an EventRecorder into Run func. it's tracked in The problem with the event recorder is that it runs (sic!) goroutines. #128282
  • 2.3 may need to define a new func for each controller to run, i.e. (c *Controller) RunWithNamedMetircs(ctx, controllerName, metrics)
  • 2.4 maybe able to get logger from ctx. It requires a new function. i.e. WaitForCacheSyncWithContext?

I'm unsure if my idea is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's current state of the contextual logging feature?

It's enabled by default, but further work is needed on helper packages, in particular client-go.

And What's the desired key name for controller name, logger or controller?

I don't think it matters, and it is not configurable, so it's "logger".

Where should other todos get the controller name?

The best approach would be to complete as much of the client-go changes as possible, then proceed as described in #126379. I don't think it makes sense to create PRs for individual APIs.

maybe able to get logger from ctx. It requires a new function. i.e. WaitForCacheSyncWithContext?

I'm working on it - see #126387

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This PR can be closed. Thanks for your review!

For the first to-do, I think the remaining thing is to remove the TODO and (Eg. Starting X, Shutting down X) words.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced by #128537

/close

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly @carlory

I think it is still useful to have the name passed separately:

  • There is no way how to get the controllerName from the context. And it is also more readable to pass it directly.
  • The logger name can be prefixed multiple times, and it is good to have a stable identifier for a controller:

The stable identifier is mainly needed for:

  • clienBuilder (service account)
  • eventSource.component
  • queue names
  • metrics / controllerManagerMetrics
  • as a fieldManager

We do not have to duplicate the name in Starting/Shutting down, but IMO it is still useful to have when debugging the controllers. Because there may be a longer name path or an incorrect/incomplete path - depending under what context the controller is run.

For example in https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-controller-manager/app/certificates.go we run multiple controllers under the same context.

I think it would be good to reopen this PR and at least unify the messages and change the Apply calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stable identifier is mainly needed for: [...]

This seems reasonable to me. Some of these cases might go away when making more of those helpers context-aware, but that's further down the road. How was that handled before? Each controller hard-coding its own name and potentially disagreeing with its caller on what that name is? I vaguely remember something along those lines.

Would it make sense to pass the controller name in the ControllerContext? It's less explicit, but also less intrusive.

Whatever way the name is passed, let's clearly document that logging doesn't need to replicate the name because it is already in the contextual logger.

Copy link
Member

@atiratree atiratree Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each controller hard-coding its own name and potentially disagreeing with its caller on what that name is?

Yeah, most of the controllers have inconsistent hardcoded names for everything and it is difficult for us to change most of the stuff. But the new controllers should name everything consistently.

Would it make sense to pass the controller name in the ControllerContext?

There is only one instance of ControllerContext per KCM and is used to pass SharedInformerFactory, etc. to the init func. And not passed to each controller. So it is not well suited for that use case.

Whatever way the name is passed, let's clearly document that logging doesn't need to replicate the name because it is already in the contextual logger.

I am okay with that. The users will have to adapt to this change though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one instance of ControllerContext per KCM

It could get copied and modified specifically for each controller. But it's not important.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2024
@k8s-ci-robot
Copy link
Contributor

@carlory: Closed this PR.

In response to this:

replaced by #128537

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@carlory
Copy link
Member Author

carlory commented Nov 5, 2024

/reopen

because the discussion #127550 (comment) is still active.

@k8s-ci-robot k8s-ci-robot reopened this Nov 5, 2024
@k8s-ci-robot
Copy link
Contributor

@carlory: Reopened this PR.

In response to this:

/reopen

because the #127550 (comment) is still open.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@carlory
Copy link
Member Author

carlory commented Nov 5, 2024

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2024
@k8s-ci-robot
Copy link
Contributor

@carlory: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-conformance-kind-ipv6-parallel 003e862 link false /test pull-kubernetes-conformance-kind-ipv6-parallel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dims dims added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: In Progress
Status: !SIG Auth
Status: Archive-it
Status: not-only-sig-node
Development

Successfully merging this pull request may close these issues.

7 participants