plugin/kubernetes: fix tombstone unwrapping #3924

chrisohaver · 2020-06-03T17:45:55Z

1. Why is this pull request needed and what does it do?

This corrects the handling of tombstones - specifically that the objects imbedded in the tombstone deltas are actually coredns/object types not k8s/api (as in normal delete events). The delete case in the processor now makes no type assertion on the object in the tombstone, and simply passes the tombstone itself to index.Delete().

Unit test: coverage added for the DefaultProcessor for adds, updates, "normal" deletes, and "tombstone" deletes.

Refactoring: Consolidated the in-line Endpoints Process function and the DefaultProcessor so all watches now use the DefaultProcessor.

2. Which issues (if any) are related?

#3879
#3860
PRs #3887 and #3890

3. Which documentation changes (if any) need to be made?

none

4. Does this introduce a backward incompatible change or deprecation?

no

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

codecov-commenter · 2020-06-03T17:55:35Z

Codecov Report

Merging #3924 into master will increase coverage by 0.03%.
The diff coverage is 61.53%.

@@            Coverage Diff             @@
##           master    #3924      +/-   ##
==========================================
+ Coverage   56.67%   56.71%   +0.03%     
==========================================
  Files         224      224              
  Lines       11374    11338      -36     
==========================================
- Hits         6446     6430      -16     
+ Misses       4432     4418      -14     
+ Partials      496      490       -6

Impacted Files	Coverage Δ
plugin/kubernetes/controller.go	`45.86% <61.53%> (+0.68%)`	⬆️
plugin/forward/proxy.go	`86.66% <0.00%> (-3.34%)`	⬇️
plugin/azure/setup.go	`62.35% <0.00%> (-0.44%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e3ef77...8aabd9c. Read the comment docs.

miekg · 2020-06-04T13:37:56Z

plugin/kubernetes/controller.go

@@ -125,7 +125,7 @@ func newdnsController(ctx context.Context, kubeClient kubernetes.Interface, opts
 			&api.Pod{},
 			cache.ResourceEventHandlerFuncs{AddFunc: dns.Add, UpdateFunc: dns.Update, DeleteFunc: dns.Delete},
 			cache.Indexers{podIPIndex: podIPIndexFunc},
-			object.DefaultProcessor(object.ToPod(opts.skipAPIObjectsCleanup)),
+			object.DefaultProcessor(object.ToPod(opts.skipAPIObjectsCleanup), nil),


this boolean arg is is only used for a single test? that seems excessive for adding it?

The boolean arg was already there. IIUC, it was added when the record latency metrics were added, to allow the test to work.

The new parameter is a function that calculates the record latency metrics for that object type. Currently it's only implemented for the Endpoints object. So, for Services and Pods, we pass nil.

miekg · 2020-06-04T13:38:42Z

plugin/kubernetes/controller.go

 	default:
 		log.Warningf("Updates for %T not supported.", ob)
 	}
 }

-func (dns *dnsControl) getServices(endpoints *object.Endpoints) []*object.Service {
+func (dns *dnsControl) getServices(endpoints *api.Endpoints) []*object.Service {


why is this now an api.Endpoint instead of object?

GetServices can construct the index key from either type. Changing it to api.Endpoint allows us to pass a single argument to recordDNSProgrammingLatency() instead of two (both api.Endpoint and object.Endpoint for the same record).

miekg · 2020-06-04T13:39:18Z

plugin/kubernetes/object/endpoint.go

-// ToEndpoints converts an api.Endpoints to a *Endpoints.
-func ToEndpoints(end *api.Endpoints) *Endpoints {
+// ToEndpoints returns a function that converts an *api.Endpoints to a *Endpoints.
+func ToEndpoints(skipCleanup bool) ToFunc {


Where does the ToFunc requirement come from? Just testing?

This is required to allow the Endpoints watch to use the default processor, instead of the current in-line processor. (it is similar in structure to the Pod and Service objects, which also use the default processor)

miekg · 2020-06-04T13:39:55Z

plugin/kubernetes/object/informer.go

@@ -20,8 +20,10 @@ func NewIndexerInformer(lw cache.ListerWatcher, objType runtime.Object, h cache.
 	return clientState, cache.New(cfg)
 }

-// DefaultProcessor is a copy of Process function from cache.NewIndexerInformer except it does a conversion.
-func DefaultProcessor(convert ToFunc) ProcessorBuilder {
+type recordLatencyFunc func(interface{})


empty interface? why?

Ah, yes, I think I could tighten it down to meta.Object, which is implemented by all k8s api objects (e.g. Service, Pod, Endpoints, )...

I could tighten it down to meta.Object

Done in 8aabd9c

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

chrisohaver · 2020-06-12T19:19:33Z

@johnbelamaric PTAL

johnbelamaric · 2020-06-12T20:24:25Z

/lgtm

corbot

Approved by johnbelamaric

luks · 2020-06-15T18:53:40Z

Hi guys, i am wonder if this fix will be in release 1.7.0, we are having same problem, we have also coredns separated on masters node outside of k8s.

chrisohaver · 2020-06-15T18:55:18Z

Hi guys, i am wonder if this fix will be in release 1.7.0, we are having same problem, we have also coredns separated on masters node outside of k8s.

Yes. It will.

luks · 2020-06-15T20:46:35Z

Super :), thank you

…

On Mon, 15 Jun 2020 at 20:55, chrisohaver ***@***.***> wrote: Hi guys, i am wonder if this fix will be in release 1.7.0, we are having same problem, we have also coredns separated on masters node outside of k8s. Yes. It will. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3924 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAWBDGE6IXPVZONZGSLCTLRWZVCLANCNFSM4NR4HTXQ> .

* fix tombstone unwrapping Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io>

[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

chrisohaver added 5 commits June 2, 2020 11:20

fix tombstone unwrapping

a935a11

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

clean up

1ea63ea

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

clean upper

ae82d4f

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

use defaultprocessor for endpoints; add test for defaultprocessor

39886b8

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

shuffling

a8f8aa3

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

chrisohaver requested review from bradbeam, johnbelamaric, miekg, rajansandeep and yongtang as code owners June 3, 2020 17:45

miekg reviewed Jun 4, 2020

View reviewed changes

use meta.Object instead of interface{}

8aabd9c

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

chrisohaver mentioned this pull request Jun 12, 2020

Should we do a 1.6.10 release? #3925

Closed

corbot bot approved these changes Jun 12, 2020

View reviewed changes

chrisohaver merged commit d902e85 into coredns:master Jun 15, 2020

chrisohaver mentioned this pull request Jul 24, 2020

invalid memory address or nil pointer dereference panic followed by timeouts in k8s API access #4022

Closed

nyodas pushed a commit to DataDog/coredns that referenced this pull request Oct 26, 2020

plugin/kubernetes: fix tombstone unwrapping (coredns#3924)

d51df51

* fix tombstone unwrapping Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

chrisohaver deleted the fix-tombstones branch January 9, 2021 14:42

aanm mentioned this pull request Sep 28, 2021

test: bump coredns version to 1.7.0 cilium/cilium#17489

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin/kubernetes: fix tombstone unwrapping #3924

plugin/kubernetes: fix tombstone unwrapping #3924

chrisohaver commented Jun 3, 2020

codecov-commenter commented Jun 3, 2020 •

edited

Loading

miekg Jun 4, 2020

chrisohaver Jun 4, 2020

miekg Jun 4, 2020

chrisohaver Jun 4, 2020

miekg Jun 4, 2020

chrisohaver Jun 4, 2020 •

edited

Loading

miekg Jun 4, 2020

chrisohaver Jun 4, 2020

chrisohaver Jun 4, 2020

chrisohaver commented Jun 12, 2020

johnbelamaric commented Jun 12, 2020

corbot bot left a comment

luks commented Jun 15, 2020

chrisohaver commented Jun 15, 2020

luks commented Jun 15, 2020 via email

plugin/kubernetes: fix tombstone unwrapping #3924

plugin/kubernetes: fix tombstone unwrapping #3924

Conversation

chrisohaver commented Jun 3, 2020

1. Why is this pull request needed and what does it do?

2. Which issues (if any) are related?

3. Which documentation changes (if any) need to be made?

4. Does this introduce a backward incompatible change or deprecation?

codecov-commenter commented Jun 3, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisohaver Jun 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisohaver commented Jun 12, 2020

johnbelamaric commented Jun 12, 2020

corbot bot left a comment

Choose a reason for hiding this comment

luks commented Jun 15, 2020

chrisohaver commented Jun 15, 2020

luks commented Jun 15, 2020 via email

codecov-commenter commented Jun 3, 2020 •

edited

Loading

chrisohaver Jun 4, 2020 •

edited

Loading