-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a per-hcp router for all hcp ingress traffic #1614
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -51,6 +51,10 @@ type AWSEndpointServiceStatus struct { | |||
// +optional | |||
DNSName string `json:"dnsName,omitempty"` | |||
|
|||
// DNSName are the names for the records created in the hypershift private zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different from the above field? does the above needs to be dropped/deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is different in that it can hold more than one value ([]string
vs string
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the semantic is the same, I don't think we need two different fields to represent 1 vs multiple, do we? Can we drop DNSName and keep only []DNSName where you can have 1..N?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the semantic is the same. We can deprecate the field, but we have to keep it around for or as long as we support an OCP version where the HCP controller does not have this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are deprecating the single field you should add a // Deprecated comment to it.
Why is external dns the discriminator, why don't always go router? Can we please capture this in https://github.com/openshift/enhancements/tree/master/enhancements/hypershift/networking |
Because it is not going to work without external DNS, as we need it to be able to do name based virtual hosting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did an initial pass, testing to better understand new behavior
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
Outdated
Show resolved
Hide resolved
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
Outdated
Show resolved
Hide resolved
control-plane-operator/controllers/hostedcontrolplane/ingress/router.template
Show resolved
Hide resolved
@@ -51,6 +51,10 @@ type AWSEndpointServiceStatus struct { | |||
// +optional | |||
DNSName string `json:"dnsName,omitempty"` | |||
|
|||
// DNSName are the names for the records created in the hypershift private zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are deprecating the single field you should add a // Deprecated comment to it.
control-plane-operator/controllers/hostedcontrolplane/oauth/route.go
Outdated
Show resolved
Hide resolved
@@ -302,15 +302,15 @@ func (r *AWSEndpointServiceReconciler) reconcileAWSEndpointServiceStatus(ctx con | |||
hasPrivateRouterEPService := false | |||
hasPrivateIngressControllerEPService := false | |||
for _, eps := range endpointServices.Items { | |||
// If a private-router AWSEndpointService exists, it means that | |||
// If a router AWSEndpointService exists, it means that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, I had just deleted the reference to private, whoever originally created the comment forgot to finish the sentence 🙃
@alvaroaleman so in the public use case I would still expect the ignition, konnectivity, oauth and ovnsdb routes to go through the one router as long as a hostname is specified for them. That doesn't look to be the case with the current code. |
Updated that and added an e2e verification.The only exception for this is currently the ovn route, which is managed by the CNO. I will fix it there once this has merged. |
✅ Deploy Preview for hypershift-docs ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
/retest-required |
1 similar comment
/retest-required |
@csrwng @sjenning so in order to save on LB ports to in turn save on security group rules I added defaulting for the KAS to listen on port 443 when the Route strategy is used. This required to:
Those two changes could be extracted from this PR, should I do that? |
/retest-required |
Only if you think it will help to make progress quicker. I wouldn't spend too much time extracting it from here. |
Tests are finally green 🎉 |
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
Outdated
Show resolved
Hide resolved
@@ -818,10 +822,36 @@ func (r *HostedControlPlaneReconciler) reconcileAPIServerService(ctx context.Con | |||
return fmt.Errorf("failed to reconcile API server service: %w", err) | |||
} | |||
|
|||
if util.IsPrivateHCP(hcp) { | |||
if serviceStrategy.Type == hyperv1.Route { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd find easier to read and understand all the possible casuistic with a swtich case for serviceStrategy.Type
here, then in default with util.IsPrivateHCP(hcp) {...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we have three logical branches:
- Uses hyperv1. Router: Always create internal route.
- subbranch is public: Create public route
- Uses hyperv1.LB and is private: Create private LB
The hyperv1.LB
for the last condition is implicit, because private AWS clusters can only have route or LB apiserver expose strategy. I've added it explicitly to the condition to make that more obvious, the check doesn't really cost anything.
kasServiceStrategy := servicePublishingStrategyByType(hostedControlPlane, hyperv1.APIServer) | ||
if util.IsPrivateHCP(hostedControlPlane) || kasServiceStrategy.Type == hyperv1.Route { | ||
r.Log.Info("Reconciling router") | ||
if err = r.reconcileRouter(ctx, hostedControlPlane, releaseImage, createOrUpdate); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this only happen when is private || kasServiceStrategy.Type == hyperv1.Route? I though part of the goal was "The per-hostedcluster router is always created, not only in the private case"
I.e shouldn't the other components also be accounted for, e.g konnectivity serviceStrategy...?
Also does it makes sense to let the HC backend validate and reject if the combination of serviceStrategy is not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll respond below
Would it work if management cluster DNS is used when no-external DNS is set? To keep solidifying and simplifying this, I'd expect:
PR lgtm overall, dropped some nits and questions but no objections to proceed and follow up later. |
This entails having to listen on a different port than what is used externally, as the KAS doesn't run as route and thus can not bind to a port < 1025.
…posed through route"" This reverts commit 9f52874.
No objections to that, but so far we didn't make any statements about which combination of publishing strategies we do or do not support
We can not do that, because there are existing clusters that use the LB strategy, which is why we have to keep supporting it. This entail that we need a feature gating mechanism that tells us if a cluster is a LB cluster or a route cluster. I went for using the api server expose strategy itself to detect that. On top of needing to support existing clusters, the other advantage of keeping on supporting it is that we have fewer dependencies, as the route strategy can not work without external DNS. This doesn't matter for the SD use case, but might be nice for the ACM use case.
This will result in a broken cluster if there is no external DNS. We need at minimum DNS entries for ignition, oauth and the KAS. Without distinct SNI/Host headers, the router can not do name based virtual hosting. |
/lgtm |
@alvaroaleman: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@csrwng could you have a look, please? |
/hold cancel |
In openshift#1614 the KAS was changed to always listen on port 6443 in oder to support privileged external ports by doing the mapping external port -> internal port in the service object. This resulted in an invalid KAS networkpolicy if the external port is not 6443, as it would allowlist the external port rather than 6443 where the KAS is actually listening.
In openshift#1614 the KAS was changed to always listen on port 6443 in oder to support privileged external ports by doing the mapping external port -> internal port in the service object. This resulted in an invalid KAS networkpolicy if the external port is not 6443, as it would allowlist the external port rather than 6443 where the KAS is actually listening.
In openshift#1614 the KAS was changed to always listen on port 6443 in oder to support privileged external ports by doing the mapping external port -> internal port in the service object. This resulted in an invalid KAS networkpolicy if the external port is not 6443, as it would allowlist the external port rather than 6443 where the KAS is actually listening.
* Expose through router * KAS service reporting: Report on router service if exposed through route * Revert "KAS service reporting: Report on router service if exposed through route" This reverts commit 219b149. * Default KAS port to 443 when exposing through router This entails having to listen on a different port than what is used externally, as the KAS doesn't run as route and thus can not bind to a port < 1025. * Create router earlier so lb services get provisioned asap * Revert "Revert "KAS service reporting: Report on router service if exposed through route"" This reverts commit 9f52874. * Rebase * Be a bit more explicit
* Expose through router * KAS service reporting: Report on router service if exposed through route * Revert "KAS service reporting: Report on router service if exposed through route" This reverts commit 219b149. * Default KAS port to 443 when exposing through router This entails having to listen on a different port than what is used externally, as the KAS doesn't run as route and thus can not bind to a port < 1025. * Create router earlier so lb services get provisioned asap * Revert "Revert "KAS service reporting: Report on router service if exposed through route"" This reverts commit 9f52874. * Rebase * Be a bit more explicit
* Expose through router * KAS service reporting: Report on router service if exposed through route * Revert "KAS service reporting: Report on router service if exposed through route" This reverts commit 219b149. * Default KAS port to 443 when exposing through router This entails having to listen on a different port than what is used externally, as the KAS doesn't run as route and thus can not bind to a port < 1025. * Create router earlier so lb services get provisioned asap * Revert "Revert "KAS service reporting: Report on router service if exposed through route"" This reverts commit 9f52874. * Rebase * Be a bit more explicit
This change adds a new mode to the cpo where it will set up a router and use it for all controlplane traffic, ref: https://issues.redhat.com/browse/HOSTEDCP-485
This is done by:
--external-dns-domain
if it configures Route or the existing LoadBalancer as publishingStrategy for the KAS/label tide/merge-method-squash