-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better ExternalName support #46332
Better ExternalName support #46332
Conversation
Skipping CI for Draft Pull Request. |
/test all |
1 similar comment
/test all |
b9448bf
to
8637065
Compare
ab57ba5
to
2696ac3
Compare
/test all |
0de2b95
to
df12c75
Compare
/test all |
df12c75
to
25cf713
Compare
/test all |
25cf713
to
936d646
Compare
/test all |
What do you mean by referenced |
apiVersion: v1
kind: Service
metadata:
name: b-ext
namespace: echo
spec:
ports:
- name: tcp
port: 80
targetPort: 18080
type: ExternalName
externalName: b.com if I reference |
936d646
to
9361515
Compare
/ok-to-test |
9361515
to
690c921
Compare
/test all |
ingress #30547 We have an explicit test that this works: see routing.go
The DNS floods are caused by the old code creating STRICT_DNS clusters for the services. This results in If you have 30s ttl, 1000 external name svc with 2 ports each, and 2000 proxies, that is 133k QPS. We have seen this level of DNS traffic caused by ExternalName in real world production clusters (obviously not for long, as the DNS server cost was astronomical, so they dropped Istio...). This all happens even under zero traffic. This PR drops this to zero additional overhead by Istio. While users will still trigger DNS queries in their applications, this is not a source of this level of DNS traffic, and is something already present in existing users clusters. |
1c3e530
to
f8a5a02
Compare
ExternalName service does not work well for tls https because of For DNS floods, if we intercept the DNS resolve from envoy to local istio DNS proxy, then it is more simple. #47567 And I would also suggest adding a knob for external name service support. So users can completely disable it |
Agree, we should not invent new behavior here, just keep existing k8s behavior (as this PR does)
This does not fully resolve the issue, only where they are using trivial aliasing to other services (pointing to a real external service like google.com , as is most common, will not work this way) and using the experimental DNS proxy feature. Additionally, it requires dramatic changes to the DNS architecture which I don't think are really feasible given it introduces a circular dependency. Again, DNS flooding is only 1 of 5+ reasons to do this PR. So while it's nice we could mitigate the issue in some cases without this PR, I don't believe it's a compelling reason to not move forward with this PR.
I don't mind that, but this PR is still critical. ExternalName is a core v1 Kubernetes feature, and Istiod must support it reasonably out of the box. The current poor support has repeatedly come up as one of the biggest blockers. |
@hzxuzhonghu can you take another look so we can keep moving forward? I am worried this will mo d the release back if we keep discussing at 1 message per day |
@howardjohn |
I am working on benchmark now
…On Thu, Oct 26, 2023 at 7:58 AM Zhonghu Xu ***@***.***> wrote:
@howardjohn <https://github.com/howardjohn>
For users who can not accept the behavior change and also want to resolve
the issues, what should we do?
And there are no other matters, except the resolveServiceAliases. Can you
add a benchmark for it, so we can evaluate how long it takes for 1000,
5000, 10000 services.
—
Reply to this email directly, view it on GitHub
<#46332 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXNWN2JVBUSJEV7ZOTDYBJ3BLAVCNFSM6AAAAAA3EWYKTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBRGMYDAOBXGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@hzxuzhonghu added benchmarks. 100 and 5k ExternalName services tested:
So in init PushContext (which I think is where your concerns were) there is a slight increase for a small number of ExternalName. if we start to have a lot, its actually better. Here is zero externalNames (no diff):
The big difference is in XDS gen, which is incredibly more efficient in this PR (since we generate way way less XDS) So overall I think this will be a big improvement in performance for most cases |
Great test, i have one doubt why |
There is another codepath we hit for the service otherwise that was slow. I forget what function exactly |
Not sure I agree 'we can't define the correct behavior' - if I specify
'externalName example.com' and an https://example.com request is made, the
correct behavior is pretty clear:
- host set to example.com
- set SNI to example.com
- verify that server has an example.com SAN set properly, signed by system
certs
There is no ambiguity - almost all https sites work the same.
Note that I'm not suggesting using Service.externalName as a 'frontend' (
i.e. capture traffic ) - but as a destination in HttpRoute or
VirtualService.
Making this work will not break anything - https://example.com will clearly
not work if the SNI, host are set to example.mynamespace.svc.cluster.local
- and
worse would be to send a plain text request.
This behavior is not defined or owned by K8S - all that K8S defines is that
a DNS CNAME will be created for the .cluster.local - but how example.com
behaves
for https is outside of K8S semantics
…On Wed, Oct 25, 2023 at 9:00 PM John Howard ***@***.***> wrote:
ExternalName service does not work well for tls https because of host,
this is documented in k8s doc. There is no need for istio to define the
correct behavior, actually it is difficult to define which one is right.
Agree, we should not invent new behavior here, just keep existing k8s
behavior (as this PR does)
For DNS floods, if we intercept the DNS resolve from envoy to local istio
DNS proxy, then it is more simple. #47567
<#47567>
This does not fully resolve the issue, only where they are using trivial
aliasing to other services (pointing to a real external service like
google.com , as is most common, will not work this way) and using the
*experimental* DNS proxy feature. Additionally, it requires dramatic
changes to the DNS architecture which I don't think are really feasible
given it introduces a circular dependency.
Again, DNS flooding is only 1 of 5+ reasons to do this PR. So while it's
nice we could mitigate the issue in some cases without this PR, I don't
believe it's a compelling reason to not move forward with this PR.
And I would also suggest adding a knob for external name service support.
So users can completely disable it
I don't mind that, but this PR is still critical. ExternalName is a core
v1 Kubernetes feature, and Istiod *must* support it reasonably out of the
box. The current poor support has repeatedly come up as one of the biggest
blockers.
—
Reply to this email directly, view it on GitHub
<#46332 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2RV643ODOPEZFK7LELYBHN37AVCNFSM6AAAAAA3EWYKTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBQGM3TQMBRGY>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
And the reason I think it's very useful to treat it correctly: this is one
of the most common external call, so many services are exposed over https.
Doing the same with ServiceEntry, DestinationRule, etc is of course
possible - but many steps.
…On Thu, Oct 26, 2023 at 7:01 PM Costin Manolache ***@***.***> wrote:
Not sure I agree 'we can't define the correct behavior' - if I specify
'externalName example.com' and an https://example.com request is made,
the correct behavior is pretty clear:
- host set to example.com
- set SNI to example.com
- verify that server has an example.com SAN set properly, signed by
system certs
There is no ambiguity - almost all https sites work the same.
Note that I'm not suggesting using Service.externalName as a 'frontend' (
i.e. capture traffic ) - but as a destination in HttpRoute or
VirtualService.
Making this work will not break anything - https://example.com will
clearly not work if the SNI, host are set to
example.mynamespace.svc.cluster.local - and
worse would be to send a plain text request.
This behavior is not defined or owned by K8S - all that K8S defines is
that a DNS CNAME will be created for the .cluster.local - but how
example.com behaves
for https is outside of K8S semantics
On Wed, Oct 25, 2023 at 9:00 PM John Howard ***@***.***>
wrote:
> ExternalName service does not work well for tls https because of host,
> this is documented in k8s doc. There is no need for istio to define the
> correct behavior, actually it is difficult to define which one is right.
>
> Agree, we should not invent new behavior here, just keep existing k8s
> behavior (as this PR does)
>
> For DNS floods, if we intercept the DNS resolve from envoy to local istio
> DNS proxy, then it is more simple. #47567
> <#47567>
>
> This does not fully resolve the issue, only where they are using trivial
> aliasing to other services (pointing to a real external service like
> google.com , as is most common, will not work this way) and using the
> *experimental* DNS proxy feature. Additionally, it requires dramatic
> changes to the DNS architecture which I don't think are really feasible
> given it introduces a circular dependency.
>
> Again, DNS flooding is only 1 of 5+ reasons to do this PR. So while it's
> nice we could mitigate the issue in some cases without this PR, I don't
> believe it's a compelling reason to not move forward with this PR.
>
> And I would also suggest adding a knob for external name service support.
> So users can completely disable it
>
> I don't mind that, but this PR is still critical. ExternalName is a core
> v1 Kubernetes feature, and Istiod *must* support it reasonably out of
> the box. The current poor support has repeatedly come up as one of the
> biggest blockers.
>
> —
> Reply to this email directly, view it on GitHub
> <#46332 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAAUR2RV643ODOPEZFK7LELYBHN37AVCNFSM6AAAAAA3EWYKTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBQGM3TQMBRGY>
> .
> You are receiving this because your review was requested.Message ID:
> ***@***.***>
>
|
@costinm As your comment, In a k8s env, it seems there is no need to create a k8s service at all. We can have a discussion via slack though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howardjohn Thanks, imo, we can do some more code cleanup later
In response to a cherrypick label: new pull request created: #47613 |
Github won't let me use my old PR (#37365) so making a new one...
Fixes #37331
Example config:
Kubernetes behavior:
b-ext
creates a CNAME record forb
. Port is not relevant.Istio (today): Create a DNS cluster for each port, which ends up resolving to Cluster IP. Match on Host header (http), SNI (tls), or just port (TCP) (meaning it captures all traffic on that port).
Issues:
This PR changes the implementation to more closely align with Kubernetes.
ExternalName becomes a new "Alias" service type internally. An alias simply adds extra matching logic to existing Services. For example, above we would add a match for
b-ext
everywhere we match forb
. This works for HTTP and TLS; for TCP it is already not needed as it works today (envoy will see traffic tob
andb-ext
as the same). If there was no Service/ServiceEntry defined at all for the targetexternalName
, then we would end up just passing through.Note that port from ExternalName is not used at all, just like in Kubernetes.
If we have a
destination
in VirtualService referring to an ExternalName service explicitly, this will now no longer work unless they have also defined the referencedexternalName
as a concrete service. For example:If I forward to this in a VS, it will work IFF I have a ServiceEntry defined for google.com. As such, it is generally not recommended to do this (you could just point to google.com directly), but can be useful when use third party Ingress configs, etc. Other Ingress implementations have extremely inconsistent support for ExternalName -- I couldn't find two implementations that behaved the same way -- so there isn't much of a standard to follow here.
One major difference in functionality with this change is that the ExternalName service is now just an alias, not its own "thing" (Envoy cluster). This means we cannot apply different rules for it. However, this can be done with standard DR subsets.
TODO:
Add tests. Critical ones are Sidecar scopingAdd feature flag to retain old behaviorAdd release note