Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable endpoint slices when upgrading through CLI #4864

Merged
merged 4 commits into from
Aug 24, 2020
Merged

Enable endpoint slices when upgrading through CLI #4864

merged 4 commits into from
Aug 24, 2020

Conversation

mateiidavid
Copy link
Member

What/How

@adleong pointed out in #4780 that when enabling slices during an upgrade, the new value does not persist in the linkerd-config ConfigMap. I took a closer look and it seems that we were never overwriting the values in case they were different.

  • To fix this, I added an if block when validating and building the upgrade options -- if the current flag value differs from what we have in the ConfigMap, then change the ConfigMap value.
  • When doing so, I made sure to check that if the cluster does not support EndpointSlices yet the flag is set to true, we will error out. This is done similarly (copy&paste similarily) to what's in the install part.
  • Additionally, I have noticed that the helm ConfigMap template stored the flag value under enableEndpointSlices field name. I assume this was not changed in the initial PR to reflect the changes made in the protocol buffer. The API (and thus the CLI) uses the field name endpointSliceEnabled instead. I have changed the config template so that helm installations will use the same field, which can then be used in the destination service or other components that may implement slice support in the future.

Tests

Install w/ slices off and upgrade w/ slices on


# install control plane through cli
$ target/cli/darwin/linkerd install --registry=docker.io/matei207 | k apply -f -
$ k get cm linkerd-config -n linkerd -o yaml | grep "endpointSliceEnabled"
    {"linkerdNamespace":"linkerd", ..., "endpointSliceEnabled":false}

# upgrade with slices enabled
$ target/cli/darwin/linkerd upgrade --enable-endpoint-slices=true | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
$ k get cm linkerd-config -n linkerd -o yaml | grep "endpointSliceEnabled"
    {"linkerdNamespace":"linkerd", ..., "endpointSliceEnabled":true}

# test that destination svc watches slices and not endpoints
$ k logs linkerd-destination-85bf58865d-fjb67 -n linkerd destination | grep "Watching EndpointSlice"
time="2020-08-12T11:09:50Z" level=debug msg="Watching EndpointSlice resources" addr=":8086" component=endpoints-watcher

Install w/ slices on and upgrade w/ slices off


# install control plane through cli
$ target/cli/darwin/linkerd install --registry=docker.io/matei207 --enable-endpoint-slices | k apply -f -
$ k get cm -n linkerd linkerd-config -o yaml | grep endpointSliceEnabled
    {"linkerdNamespace":"linkerd", ..., "endpointSliceEnabled":true

# upgrade with slices disabled
$ target/cli/darwin/linkerd install --registry=docker.io/matei207 --enable-endpoint-slices=false | k apply -f -
$ k get cm -n linkerd linkerd-config -o yaml | grep endpointSliceEnabled
    {"linkerdNamespace":"linkerd", ..., "endpointSliceEnabled":false}

# test that destination svc watches endpoints
$ k logs linkerd-destination-7c47f8f696-ddnpc -n linkerd destination | grep "Watching Endpoints"
time="2020-08-12T12:42:22Z" level=debug msg="Watching Endpoints resources" addr=":8086" component=endpoints-watcher

Helm install


Helm:

$ bin/helm install linkerd2 \
  --set-file global.identityTrustAnchorsPEM=ca.crt \
  --set-file identity.issuer.tls.crtPEM=issuer.crt \
  --set-file identity.issuer.tls.keyPEM=issuer.key \
  --set identity.issuer.crtExpiry=$exp \
  charts/linkerd2

# previously we would have had "enableEndpointSlices": false
$ k get cm -n linkerd linkerd-config -o yaml | grep endpointSliceEnabled
      "endpointSliceEnabled": "false"

Signed-off-by: Matei David matei.david.35@gmail.com

@mateiidavid mateiidavid requested a review from a team as a code owner August 12, 2020 13:11
Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to wrap my head around the interaction of flags and configs but I thiiiiiiiiiiink what we need to do here is to add the endpoint slice flag to the list of flags in installOptions.recordFlags. I believe that this will cause linkerd upgrade to save the value of the endpoint slice flag into the upgrade options. @alpeb does this sound correct to you?

@mateiidavid
Copy link
Member Author

I'm struggling to wrap my head around the interaction of flags and configs but I thiiiiiiiiiiink what we need to do here is to add the endpoint slice flag to the list of flags in installOptions.recordFlags. I believe that this will cause linkerd upgrade to save the value of the endpoint slice flag into the upgrade options. @alpeb does this sound correct to you?

Hmmm, I am struggling a bit myself with flags and configs 😅. I thought that was already happening. I remember I specifically looked at which flag set carries on to linkerd upgrade when I first implemented this. At the moment, enable-endpoint-slices is initialised as a flag in recordableFlagSet().

My assumption was that initially when using the recordable flags we render out the values (including linkerd-config cm) and also persist the value of the flags. When we upgrade, we re-use the same flags as the install (through the recordable flags) so that there is no mismatch in what is passed. Because simply rendering the values is not enough when upgrading, we also have to overwrite the values in linkerd-config. I assumed that would already be done though.

Again, not sure if I misunderstood something along the way.

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok after digging a little bit deeper, this looks correct after all. Sorry for the confusion 😅

I'm starting to regret the advice that I gave you in #4740 (comment) about making this a value read from the config rather than a flag to the destination service. If we had decided to go with a destination service flag, then I don't think we would need to store this directly in the global config at all and it could just be a recorded flag. Additionally, doing an upgrade where we enable or disable endpoint slices currently doesn't trigger the destination controller to restart because the manifest doesn't change. If we had made this a destination flag, it would trigger a restart, which would be nice. Anyway, not sure if we want to revisit that right now or not.

@@ -292,6 +292,16 @@ func (options *upgradeOptions) validateAndBuild(stage string, k *k8s.KubernetesA
}
}

if options.enableEndpointSlices != configs.GetGlobal().GetEndpointSliceEnabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we specifically need this check because setFlagsFromInstall will make the default value of the flag be the value that was stored in the configmap. Additionally, we could move this up to around line 250 to be next to the OmitWebhookSideEffects setting, since they both work the same way (a global configs setting from a recorded flag).

@alpeb
Copy link
Member

alpeb commented Aug 13, 2020

Yep, flag handling is currently some convoluted business :-(
Alex' comment should do the trick. If you do wanna keep the value in configs.Global and change the name, make sure you also update the corresponding entry in ./pkg/charts/linkerd2/values.go

When upgrading the linkerd control plane through the cli, if upgrading
with endpoint slices turned on (when they were previously off), the new flag
does not persist in the configmap. To fix this, the configmap value should now be updated
if upgrading with slices turned on. Additionally, the helm configmap tmpl also
used a different name to the protobuf file, this has been changed to avoid future issues.

Signed-off-by: Matei David <matei.david.35@gmail.com>
@mateiidavid
Copy link
Member Author

Hiya @alpeb and @adleong thanks for the comments!

@adleong
I have moved up the logic to be closer to OmitWebhooks and removed the unnecessary check. It makes sense based on what Alex said so thanks for pointing that out!

I see your point about the upgrade story... I'd say optimistically adopters would still roll their pods even though the manifest has not changed? Either way, I'd be more than willing to revisit this and do the required work, however, would deprecating the endpointSliceEnabled field in the protocol buffer be straightforward? We would have to do that if we wanted to switch to a destination service-specific flag, right? I suppose it depends on whether we will certainly want to do that in the future, if you think there is a strong chance we will do it might as well do it sooner rather than later? Especially since the slice feature barely has any traction, it's easier to change it?

@alpeb

If you do wanna keep the value in configs.Global and change the name, make sure you also update the corresponding entry in ./pkg/charts/linkerd2/values.go

makes sense! don't think at the moment it needs to be changed, because the flag name is still the same but the name of the feature in the configmap is different. The reason for this difference in naming is because at the time I thought endpointSliceEnabled (i.e the cm value) sounds better as a boolean variable name but enableEndpointSlices sounds better as a flag name o.o. Think I accidentally made a mess out of it because of this. Because of this and my weak memory, in the _config.tpl I put the cm field name as enableEndpointSlices when it was supposed to be endpointSliceEnabled (the same as the protobuf).

Would you like me to rename the flag to be the same as the cm field -- so both would be enableEndpointSlices or endpointSliceEnabled?

Signed-off-by: Matei David <matei.david.35@gmail.com>
Signed-off-by: Matei David <matei.david.35@gmail.com>
@alpeb
Copy link
Member

alpeb commented Aug 14, 2020

The update.go changes look good to me 👍

Would you like me to rename the flag to be the same as the cm field -- so both would be enableEndpointSlices or endpointSliceEnabled?

Looking at current flags, we have enable-external-profiles (action connotation) and linkerd-cni-enabled (boolean connotation). That might sound inconsistent, but it's fine as long as you interpret it as the former being something you set during install, whereas the latter being something you're informing that was already done (the CNI should have been installed already). In that vein, enable-endpoint-slices would be fine.

As for the the CM and values.yaml names, I think it's less confusing naming them like the flag. That being said, I agree it's better to have it as a destination flag instead. Just removing EndpointSliceEnabled from the config proto (and its reference in install.go) doesn't break upgrade because when we unmarshal we ignore unknown fields.

Remove cm value from install.go and remove cm value from _config.tpl
Add flag to destination service chart tpl
Remove value overwrite in upgrade & value from proto file
Use destination service flag in main function

Signed-off-by: Matei David <matei.david.35@gmail.com>
@mateiidavid
Copy link
Member Author

mateiidavid commented Aug 17, 2020

@alpeb thank you for the comment! Based on your feedback and Alex's, I have updated this PR to use the flag in the destination service instead of storing it in the configmap. It would be great to have this double checked to ensure there is no compatibility issue moving forward.

Changes


  • Removed endpointSliceEnabled field from the configmap

    • removed the field from install.go when writing configmap configuration
    • removed the field from _config.tpl
    • removed value from the protocol buffer file
    • removed the value overwrite in upgrade.go -- kept the check to make sure the upgrade fails if the slices are enabled and cluster does not support the slice resource type however.
  • Modified EndpointSlices flag to be on the destination service side:

    • destination service now can be configured to run with an enable-endpoint-slices flag
    • in the destination service chart template, we initialise this flag with whatever it was set to be during install/upgrade time
  • Basically we still use the same enable-endpoint-slices flag when installing through helm or the cli, but rather than persisting the value of the flag in a dedicated field in the configmap, the value is simply passed to the destination service flag.

  • When upgrading we now trigger a restart which is kind of cool.

Tests

  • Tested two scenarios: from enabled at install time to disabled at upgrade time and the other way around.

from enabled to disabled

$ target/cli/darwin/linkerd install --registry=docker.io/matei207 --enable-endpoint-slices --controller-log-level=debug | k apply -f -

$ k get cm linkerd-config -n linkerd -o yaml | grep endpointSliceEnabled # removed CM value

$ k logs linkerd-destination-78879d5c94-fxf2v -n linkerd destination | grep Watching
time="2020-08-17T14:34:58Z" level=debug msg="Watching EndpointSlice resources" addr=":8086" component=endpoints-watcher

$ target/cli/darwin/linkerd upgrade --enable-endpoint-slices=false | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -

$ k get cm linkerd-config -n linkerd -o yaml | grep endpointSliceEnabled

$ k get pods -n linkers -o wide
NAME                                      READY   STATUS        RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
linkerd-controller-64bff856cd-cjvxp       2/2     Running       0          62m   10.244.2.2   kind-worker2   <none>           <none>
linkerd-destination-6fb886b4df-cggjt      2/2     Running       0          26s   10.244.2.7   kind-worker2   <none>           <none>
linkerd-destination-78879d5c94-fxf2v      0/2     Terminating   0          62m   10.244.3.3   kind-worker    <none>           <none>
(…)

$ k logs linkerd-destination-6fb886b4df-cggjt -n linkerd destination | grep Watching
time="2020-08-17T15:36:50Z" level=debug msg="Watching Endpoints resources" addr=":8086" component=endpoints-watcher

from disabled to enabled

$ target/cli/darwin/linkerd install --registry=docker.io/matei207 --controller-log-level=debug | k apply -f - 

$ k get cm linkerd-config -n linkerd -o yaml | grep endpointSliceEnabled # removed CM value

$ k logs linkerd-destination-7ff85f8984-8q9j2 -n linkerd destination | grep Watching
time="2020-08-17T15:44:24Z" level=debug msg="Watching Endpoints resources" addr=":8086" component=endpoints-watcher

$ target/cli/darwin/linkerd upgrade --enable-endpoint-slices | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -

$ kgp -n linkerd -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP            NODE           NOMINATED NODE   READINESS GATES
linkerd-controller-6d5746f8f5-28w4p       2/2     Running   0          15m   10.244.2.8    kind-worker2   <none>           <none>
linkerd-destination-6dd97457f-hjz8t       2/2     Running   0          76s   10.244.2.13   kind-worker2   <none>           <none>

$ k get cm linkerd-config -n linkerd -o yaml | grep endpointSliceEnabled

$ k logs linkerd-destination-6dd97457f-hjz8t -n linkerd destination | grep Watching
time="2020-08-17T15:58:27Z" level=debug msg="Watching EndpointSlice resources" addr=":8086" component=endpoints-watcher

And last but not least:

 target/cli/darwin/linkerd version
Client version: git-b9e96fd0
Server version: git-b9e96fd0

@adleong WDYT?

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍 :shipit: 🚢 🎖️

@adleong adleong merged commit 7ed904f into linkerd:main Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants