-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin/kubernetes: fix tombstone unwrapping #3924
Conversation
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Codecov Report
@@ Coverage Diff @@
## master #3924 +/- ##
==========================================
+ Coverage 56.67% 56.71% +0.03%
==========================================
Files 224 224
Lines 11374 11338 -36
==========================================
- Hits 6446 6430 -16
+ Misses 4432 4418 -14
+ Partials 496 490 -6
Continue to review full report at Codecov.
|
@@ -125,7 +125,7 @@ func newdnsController(ctx context.Context, kubeClient kubernetes.Interface, opts | |||
&api.Pod{}, | |||
cache.ResourceEventHandlerFuncs{AddFunc: dns.Add, UpdateFunc: dns.Update, DeleteFunc: dns.Delete}, | |||
cache.Indexers{podIPIndex: podIPIndexFunc}, | |||
object.DefaultProcessor(object.ToPod(opts.skipAPIObjectsCleanup)), | |||
object.DefaultProcessor(object.ToPod(opts.skipAPIObjectsCleanup), nil), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this boolean arg is is only used for a single test? that seems excessive for adding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boolean arg was already there. IIUC, it was added when the record latency metrics were added, to allow the test to work.
The new parameter is a function that calculates the record latency metrics for that object type. Currently it's only implemented for the Endpoints object. So, for Services and Pods, we pass nil.
default: | ||
log.Warningf("Updates for %T not supported.", ob) | ||
} | ||
} | ||
|
||
func (dns *dnsControl) getServices(endpoints *object.Endpoints) []*object.Service { | ||
func (dns *dnsControl) getServices(endpoints *api.Endpoints) []*object.Service { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this now an api.Endpoint instead of object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetServices
can construct the index key from either type. Changing it to api.Endpoint
allows us to pass a single argument to recordDNSProgrammingLatency()
instead of two (both api.Endpoint
and object.Endpoint
for the same record).
// ToEndpoints converts an api.Endpoints to a *Endpoints. | ||
func ToEndpoints(end *api.Endpoints) *Endpoints { | ||
// ToEndpoints returns a function that converts an *api.Endpoints to a *Endpoints. | ||
func ToEndpoints(skipCleanup bool) ToFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the ToFunc requirement come from? Just testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required to allow the Endpoints watch to use the default processor, instead of the current in-line processor. (it is similar in structure to the Pod and Service objects, which also use the default processor)
plugin/kubernetes/object/informer.go
Outdated
@@ -20,8 +20,10 @@ func NewIndexerInformer(lw cache.ListerWatcher, objType runtime.Object, h cache. | |||
return clientState, cache.New(cfg) | |||
} | |||
|
|||
// DefaultProcessor is a copy of Process function from cache.NewIndexerInformer except it does a conversion. | |||
func DefaultProcessor(convert ToFunc) ProcessorBuilder { | |||
type recordLatencyFunc func(interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty interface? why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I think I could tighten it down to meta.Object
, which is implemented by all k8s api objects (e.g. Service, Pod, Endpoints, )...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could tighten it down to meta.Object
Done in 8aabd9c
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@johnbelamaric PTAL |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved by johnbelamaric
Hi guys, i am wonder if this fix will be in release 1.7.0, we are having same problem, we have also coredns separated on masters node outside of k8s. |
Yes. It will. |
Super :), thank you
…On Mon, 15 Jun 2020 at 20:55, chrisohaver ***@***.***> wrote:
Hi guys, i am wonder if this fix will be in release 1.7.0, we are having
same problem, we have also coredns separated on masters node outside of k8s.
Yes. It will.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3924 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWBDGE6IXPVZONZGSLCTLRWZVCLANCNFSM4NR4HTXQ>
.
|
* fix tombstone unwrapping Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io>
coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
1. Why is this pull request needed and what does it do?
This corrects the handling of tombstones - specifically that the objects imbedded in the tombstone deltas are actually
coredns/object
types notk8s/api
(as in normal delete events). The delete case in the processor now makes no type assertion on the object in the tombstone, and simply passes the tombstone itself toindex.Delete()
.Unit test: coverage added for the DefaultProcessor for adds, updates, "normal" deletes, and "tombstone" deletes.
Refactoring: Consolidated the in-line Endpoints Process function and the DefaultProcessor so all watches now use the DefaultProcessor.
2. Which issues (if any) are related?
#3879
#3860
PRs #3887 and #3890
3. Which documentation changes (if any) need to be made?
none
4. Does this introduce a backward incompatible change or deprecation?
no