-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
federation: Updating KubeDNS to try finding a local service first for federation query #27708
federation: Updating KubeDNS to try finding a local service first for federation query #27708
Conversation
path := strings.Split(trimmed, ".") | ||
isFederationQuery := false | ||
federationPath := []string{} | ||
if !exact && kd.isFederationQuery(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this exact parameter.
Looking at the existing code, it seems that we should be checking !exact here.
Can remove it if its not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhiljindal See #25727 (comment) for context.
bb0b8f6
to
8abccd3
Compare
Thanks @nikhiljindal - I'm going to review this now. |
@@ -420,8 +420,20 @@ func (kd *KubeDNS) newHeadlessService(service *kapi.Service) error { | |||
func (kd *KubeDNS) Records(name string, exact bool) (retval []skymsg.Service, err error) { | |||
glog.Infof("Received DNS Request:%s, exact:%v", name, exact) | |||
trimmed := strings.TrimRight(name, ".") | |||
segments := strings.Split(trimmed, ".") | |||
path := reverseArray(segments) | |||
path := strings.Split(trimmed, ".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this segments for terminology consistency. path is the reverse segments.
LGTM other than a minor nit. Feel free to apply the label after fixing it. |
federationPath = append(federationPath, path...) | ||
// To try local service, remove federation name from path. | ||
// Federation name is 3rd in the path (after service name and namespace). | ||
path = append(path[:2], path[3:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhiljindal @madhusudancs I'm probably overlooking something, but its not clear to me where we ensure that we avoid returning the cluster IP of a local service that has no healthy endpoints backing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works when there is no endpoint for the local service (I have a test for it).
But I just tried deleting endpoints from an existing service and it still returned me the endpoint (which I deleted). Am debugging that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that was just an error in the way I was testing.
What I said above was only true for headless services. Updated the code to handle normal services as well (which is what federation will create)
d98b9f6
to
5ed5ccc
Compare
Updated the code to check for endpoints of a service before returning its cluster IP for a federation query. Added tests as well. PTAL |
This is a pretty substantial change that will need to be carefully reviewed and tested. If we break KubeDNS, the cluster is broken, so that's really bad. Do we understand why it appeared to work, at least in a limited fashion, before these changes? To be clear, I've not reviewed to code extensively yet. But I'm very nervous about making changes to such a key component of every Kubernetes cluster so late before a major release, unless we are super-confident about the proposed changes. Are we super-confident @nikhiljindal @madhusudancs ? @ArtfulCoder could you also please take a look? More eyes are better. |
PS: I would ideally like us to bang hard on this for a few days in a soak test cluster before declaring it good. I can assist in making that happen. |
Totally agree @quinton-hoole I have restricted the changes to be only when the query is for federation service. And have added extensive tests. The existing code has good tests as well. Will wait until someone familiar with this code has had a chance to take a look. |
cc @girishkalele who I am told is the new KubeDNS owner :) |
Added a check to validate that federation names dont have dots '.' as per discussion with @ArtfulCoder and @madhusudancs |
TODOs left in the PR:
(I spoke with Nikhil and he knows what the above mean code-wise).. |
Automatic merge from submit-queue federation e2e: Adding retries to fetching secret in controller manager Ref: #27708 Trying to fix the following failure in fed controller manager: ``` error in fetching secret: Get https://10.0.0.1:443/api/v1/namespaces/federation/secrets/federation-apiserver-secret: dial tcp 10.0.0.1:443: i/o timeout ``` I am not sure why that error is happening in the first place. kube-proxy should have been configured before fed controller manager pod comes up. I didnt find anything wrong in kube-proxy logs. The request never reaches fed apiserver. Lets see if adding retries helps. cc @kubernetes/sig-cluster-federation @mml @colhom
@nikhiljindal Still needs rebase, and the verification error is below. Hopefully the rebase will sort that out too.
|
@@ -558,6 +637,9 @@ func getSkyMsg(ip string, port int) (*skymsg.Service, string) { | |||
// 5. Fourth segment is exactly "svc" | |||
// 6. The remaining segments match kd.domainPath. | |||
// 7. And federation must be one of the listed federations in the config. | |||
// Note: Because of the above conditions, this method will treat wildcard queries such as | |||
// *.mysvc.myns.myfederation.svc.domain.path as non-federation queries. | |||
// We can add support for wildcard queries later, if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would like us to support wildcard queries for federated services, as this will be an easy way for clients to discover all shards of a service globally, with a single DNS query. Pretty cool. Can wait until post-1.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quinton-hoole How are the clients expected to format the wildcard-names? Specifically, where do you expect them to put the *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what do you expect the KubeDNS response to be?
Tests on this PR ran with the updated KubeDNS image (with my code changes). Great to see that they all passed.
I discussed this with @quinton-hoole and we discussed a few potential solutions, but decided not to fix it in 1.3. Have filed #27969 to keep track. Also, @quinton-hoole suggested returning a CNAME record for the local service (if it exists) instead of the A records directly in response to a federation DNS query. Making that change now. |
assert.Equal(t, a, records[0].Host) | ||
} | ||
|
||
// Verifies that quering KubeDNS for a headless federation service returns the DNS hostname when a local service does not exist and returns the endpoint IP when a local service exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, for both headless and non-headless, we should return a CNAME (to the local service's DNS name if the service exists and has endoints, or to the zone-local DNS name if the local service has no endpoints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@nikhiljindal Could you update the manifest under |
75d3a5c
to
778c49c
Compare
Sure @yifan-gu done. Update code as per comments. Added a commit to return the DNS hostname of local service rather than returning its endpoints directly. PTAL. |
isFederationQuery := false | ||
federationSegments := []string{} | ||
if !exact && kd.isFederationQuery(segments) { | ||
glog.Infof("federation service query: Received federation query. Going to try to find local service first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to a lower log level ? Seems chatty with no specific information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the information this gives is that this query was interpreted as a federation query.
I would like to keep it for debugging.
Note that we wont see this in the logs unless someone has federation enabled and is sending federation queries.
I see a new GKE e2e here which is failing with some auth problems unrelated to this PR. I dont see any issue filed. |
Both gke e2e seem to be known flakes.
or kubernetes-e2e-gke/9788 ?
|
778c49c
to
1e87b9b
Compare
As far as I can tell, there is no non-federation functionality affected.
|
1e87b9b
to
7be4293
Compare
Squashed into 2 commits (dns changes and YAML file changes). |
LGTM - lets get it in. |
thx @girishkalele |
GCE e2e build/test passed for commit 7be4293. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
@nikhiljindal I suggest that you follow @girishkalele 's suggestions re logging verbosity. Rather log as glog.V(4) and set --v=4 on the binary while you're debugging? Will that work for you? |
@nikhiljindal @fejta PS that new GKE smoke test is failing on #27972 also. |
GCE e2e build/test passed for commit 7be4293. |
Automatic merge from submit-queue |
Pushed and verified that 1.5 image now exists on all platforms: |
Ref #26762
Updating KubeDNS to try to find a local service first for federation query.
Without this change, KubeDNS always returns the DNS hostname, even if a local service exists.
Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname.
Will appreciate a strong review since this is my first change to KubeDNS.
#25727 was the original PR that added federation support to KubeDNS.
cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml