Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-dns should be the only Nameserver for Pod with DnsPolicy=ClusterFirst #15592

Closed
ArtfulCoder opened this issue Oct 14, 2015 · 18 comments
Closed
Assignees
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@ArtfulCoder
Copy link
Contributor

#15303 (comment)

With more than one NS on cluster-first pods, the problems are:

  1. Faulty NXDomain Problem:
    If skydns were to fail, clients on a pod (with clusterDnsPolicy) will start getting NXDOMAIN for *cluster.local lookups instead of SERVFAIL(which could have signaled clients to retry), from the second NS
  2. Faulty Resolution Problem:
    If skydns were to fail, clients on a pod (with clusterFirst DnsPolicy) might accidentally get the wrong IP.
    Consider that a k8s user has created a service called "mail".
    Assume that the search paths are "default.svc.cluster.local google.com"
    Ideally, the Query should be resolved by skydns for mail.default.svc.cluster.local.
    But since Skydns is down, the Query gets resolved by the fallback nameserver for mail.google.com

The Faulty Resolution Problem exists even if skydns is used in no-rec mode.

  • We need to exercise caution while doing this because skydns needs to be robust enough.
    Otherwise ClusterFirst containers will be badly impacted with skydns being their only NS.
    (skydns will be in recurse mode)
  • We should have atleast 2 instances of kube-dns running for High Availability.
    (And also because kubelet has a backoff policy if any pod restarts too many times in a certain time window)
  • SkyDNS will use the other NS for queries that don't end in ".cluster.local" (domain param of skydns)
@ArtfulCoder ArtfulCoder added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Oct 14, 2015
@ArtfulCoder ArtfulCoder self-assigned this Oct 14, 2015
@ArtfulCoder ArtfulCoder added this to the v1.1 milestone Oct 14, 2015
@ArtfulCoder
Copy link
Contributor Author

@thockin

@ArtfulCoder
Copy link
Contributor Author

@smarterclayton @sosiouxme Would this impact RH install since RH uses skydns in no-rec mode ?
Do read my first comment in this issue..
I believe RH should move to using skydns in rec mode and have ClusterFirst pods use skydns only (without fallback NS on the pod)

@thockin
Copy link
Member

thockin commented Oct 15, 2015

I think using SkyDNS in recursive mode is the better answer. I think it has the best overall behavior. If the cluster DNS is down, the cluster is in trouble, falling back to a DNS server that only has partial answers (no in-cluster names) is wrong.

Additionally, asking for a name like google.com should be fast. The general flow is:

client queries skydns for google.com
skydns knows that is not the local domain
skydns delegates to next nameserver
google.com is resolved

I proved it with tcpdump

# Client (10.244.0.10) queries `foo.bar.quux` on skydns (10.244.0.11)
04:05:03.269959 IP 10.244.0.10.47162 > 10.244.0.11.53: 6+ A? foo.bar.quux. (30)

# Skydns delegates to (intentionally wrong) upstream #1
04:05:03.270139 IP 10.244.0.11.43460 > 168.254.169.254.53: 6+ A? foo.bar.quux. (30)

# Upstream times out, skydns delegates to upstream #2
04:05:05.270373 IP 10.244.0.11.37639 > 10.240.0.1.53: 6+ A? foo.bar.quux. (30)

# Upstream #2 says no
04:05:05.271234 IP 10.240.0.1.53 > 10.244.0.11.37639: 6 NXDomain 0/1/0 (105)

# Skydns says no
04:05:05.271423 IP 10.244.0.11.53 > 10.244.0.10.47162: 6 NXDomain 0/1/0 (105)

# Client uses search path
04:05:05.271501 IP 10.244.0.10.53556 > 10.244.0.11.53: 7+ A? foo.bar.quux.default.svc.cluster.local. (56)

# Skydns recognizes the suffix, says no.
04:05:05.272201 IP 10.244.0.11.53 > 10.244.0.10.53556: 7 NXDomain* 0/1/0 (117)

# Repeat for each search
04:05:05.272249 IP 10.244.0.10.56489 > 10.244.0.11.53: 8+ A? foo.bar.quux.svc.cluster.local. (48)
04:05:05.272689 IP 10.244.0.11.53 > 10.244.0.10.56489: 8 NXDomain* 0/1/0 (109)
04:05:05.272780 IP 10.244.0.10.48635 > 10.244.0.11.53: 9+ A? foo.bar.quux.cluster.local. (44)
04:05:05.273259 IP 10.244.0.11.53 > 10.244.0.10.48635: 9 NXDomain* 0/1/0 (105)

@karlkfi

@ArtfulCoder
Copy link
Contributor Author

The behavior you saw tried FQDN first, followed by relative path-suffixed tries
This happens with busybox libc.
On unbuntu, FQDN is tried last.
that implies that on some pods, google.com query could take a while to resolve, after it has gone through the search paths. (high ndots value aggravates this issue)

@ae6rt
Copy link
Contributor

ae6rt commented Oct 15, 2015

If SkyDNS is the only nameserver in /etc/resolv.conf for ClusterFirst pods, and I want SkyDNS to recurse, where will the reference to the other nameserver come from? Sorry if I missed this somewhere.

@ArtfulCoder
Copy link
Contributor Author

skydns gets the reference for other nameservers from the host's /etc/resolv.conf

@karlkfi
Copy link
Contributor

karlkfi commented Oct 15, 2015

I agree recursion is the best option, but falling back is not "wrong". Ideally, for the Mesos cases, kube-DNS is always available and handles recursion to Mesos-DNS and host cluster DNS, but don't forget the bootstrapping and failure conditions. You want to be able to deploy all addons simultaneously, not wait for kube-DNS to be ready before deploying the rest. Other pods also still need to be able to failover to access the rest of the Internet when kube-DNS is either down or not up yet. So the other DNS have to be in /etc/resolv.conf anyway.

@bgrant0607
Copy link
Member

If this is for 1.1, it needs to be P0 at this point. Should it be?

@ArtfulCoder
Copy link
Contributor Author

I am still trying to convince myself that this is a safe change...

@ArtfulCoder ArtfulCoder modified the milestones: v1.2, v1.1 Oct 15, 2015
@ArtfulCoder ArtfulCoder added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 15, 2015
@ArtfulCoder
Copy link
Contributor Author

We are going to punt this to v1.2
It will be safer to make this change when we move to the motr performant iptables-based proxy.
Till then, let us have clients have the fallback nameservers, the way it has been today..
#15752

@smarterclayton
Copy link
Contributor

Some of these changes are also moot if we serve DNS from the node-proxy as
well.

On Oct 15, 2015, at 7:35 PM, Abhi Shah notifications@github.com wrote:

We are going to punt this to v1.2
It will be safer to make this change when we move to the motr performant
iptables-based proxy.
Till then, let us have clients have the fallback nameservers, the way it
has been today..


Reply to this email directly or view it on GitHub
#15592 (comment)
.

@ArtfulCoder
Copy link
Contributor Author

closed via #18089

@liggitt
Copy link
Member

liggitt commented Jan 25, 2016

@smarterclayton we're picking this up in a rebase now... I'm not super-happy about letting skydns be an open resolver... were you ok with this change?

@ae6rt
Copy link
Contributor

ae6rt commented Jan 25, 2016

I've lost track of what the conclusion has been here, but I rely on skydns calling on other nameservers in resolv.conf to resolve a name skydns cannot natively resolve. iow, skydns acts like a recursive nameserver, providing answers for which it is authoritative when it can, and otherwise querying the other nameservers in resolv.conf when it cannot. Is that changing?

@liggitt
Copy link
Member

liggitt commented Jan 25, 2016

I rely on skydns calling on other nameservers in resolv.conf

Which resolv.conf? If I understand #18089 correctly, the resolv.conf on the nodes is now ignored, and it assumes skydns is running in open resolver mode.

@ae6rt
Copy link
Contributor

ae6rt commented Jan 25, 2016

The "which resolv.conf" is the resolv.conf in the container, which up to now-ish been including not only the skydns IP but also nameservers in resolv.conf from the node.

I have apps I'm deploying into Kubernetes that conventionally do lookups on dependencies with names like a.b.example.com. skydns does not know how to solve those names, but the node nameservers do. My experience is that I cannot create a Kubernetes Service to intercept these with names that contain dots.

Also, by "open resolver" you mean if skydns fails to resolve a name internally, it will start traversing the root nameservers for an answer?

@liggitt
Copy link
Member

liggitt commented Jan 25, 2016

The "which resolv.conf" is the resolv.conf in the container, which up to now-ish been including not only the skydns IP but also nameservers in resolv.conf from the node.

I have apps I'm deploying into Kubernetes that conventionally do lookups on dependencies with names like a.b.example.com. skydns does not know how to solve those names, but the node nameservers do.

Right, I think #18089 removed the node's nameservers from the container's resolv.conf. Opened #20090 to fix the issue.

Also, by "open resolver" you mean if skydns fails to resolve a name internally, it will start traversing the root nameservers for an answer?

Yes, it attempts to answer for all domains by contacting other nameservers for domains it doesn't know about.

@sosiouxme
Copy link

An open resolver is any DNS server on the internet that returns answers for
domains for which it isn't authoritative. If SkyDNS is recursively asking
upstream nameservers for answers to queries about domains outside
.cluster.local and passing them on, and doing so within reach of the
internet, it's an open resolver.

On Mon, Jan 25, 2016 at 10:10 AM, Mark Petrovic notifications@github.com
wrote:

The "which resolv.conf" is the resolv.conf in the container, which up to
now-ish been including not only the skydns IP but also nameservers in
resolv.conf from the node.

I have apps I'm deploying into Kubernetes that conventionally do lookups
on dependencies with names like a.b.example.com. skydns does not know how
to solve those names, but the node nameservers do. My experience is that I
cannot create a Kubernetes Service to intercept these with names that
contain dots.

Also, by "open resolver" you mean if skydns fails to resolve a name
internally, it will start traversing the root nameservers for an answer?


Reply to this email directly or view it on GitHub
#15592 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

9 participants