Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/kubernetes: fix tombstone unwrapping #3924

Merged
merged 6 commits into from
Jun 15, 2020

Conversation

chrisohaver
Copy link
Member

1. Why is this pull request needed and what does it do?

This corrects the handling of tombstones - specifically that the objects imbedded in the tombstone deltas are actually coredns/object types not k8s/api (as in normal delete events). The delete case in the processor now makes no type assertion on the object in the tombstone, and simply passes the tombstone itself to index.Delete().

Unit test: coverage added for the DefaultProcessor for adds, updates, "normal" deletes, and "tombstone" deletes.

Refactoring: Consolidated the in-line Endpoints Process function and the DefaultProcessor so all watches now use the DefaultProcessor.

2. Which issues (if any) are related?

#3879
#3860
PRs #3887 and #3890

3. Which documentation changes (if any) need to be made?

none

4. Does this introduce a backward incompatible change or deprecation?

no

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2020

Codecov Report

Merging #3924 into master will increase coverage by 0.03%.
The diff coverage is 61.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3924      +/-   ##
==========================================
+ Coverage   56.67%   56.71%   +0.03%     
==========================================
  Files         224      224              
  Lines       11374    11338      -36     
==========================================
- Hits         6446     6430      -16     
+ Misses       4432     4418      -14     
+ Partials      496      490       -6     
Impacted Files Coverage Δ
plugin/kubernetes/controller.go 45.86% <61.53%> (+0.68%) ⬆️
plugin/forward/proxy.go 86.66% <0.00%> (-3.34%) ⬇️
plugin/azure/setup.go 62.35% <0.00%> (-0.44%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e3ef77...8aabd9c. Read the comment docs.

@@ -125,7 +125,7 @@ func newdnsController(ctx context.Context, kubeClient kubernetes.Interface, opts
&api.Pod{},
cache.ResourceEventHandlerFuncs{AddFunc: dns.Add, UpdateFunc: dns.Update, DeleteFunc: dns.Delete},
cache.Indexers{podIPIndex: podIPIndexFunc},
object.DefaultProcessor(object.ToPod(opts.skipAPIObjectsCleanup)),
object.DefaultProcessor(object.ToPod(opts.skipAPIObjectsCleanup), nil),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this boolean arg is is only used for a single test? that seems excessive for adding it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boolean arg was already there. IIUC, it was added when the record latency metrics were added, to allow the test to work.

The new parameter is a function that calculates the record latency metrics for that object type. Currently it's only implemented for the Endpoints object. So, for Services and Pods, we pass nil.

default:
log.Warningf("Updates for %T not supported.", ob)
}
}

func (dns *dnsControl) getServices(endpoints *object.Endpoints) []*object.Service {
func (dns *dnsControl) getServices(endpoints *api.Endpoints) []*object.Service {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this now an api.Endpoint instead of object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetServices can construct the index key from either type. Changing it to api.Endpoint allows us to pass a single argument to recordDNSProgrammingLatency() instead of two (both api.Endpoint and object.Endpoint for the same record).

// ToEndpoints converts an api.Endpoints to a *Endpoints.
func ToEndpoints(end *api.Endpoints) *Endpoints {
// ToEndpoints returns a function that converts an *api.Endpoints to a *Endpoints.
func ToEndpoints(skipCleanup bool) ToFunc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the ToFunc requirement come from? Just testing?

Copy link
Member Author

@chrisohaver chrisohaver Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required to allow the Endpoints watch to use the default processor, instead of the current in-line processor. (it is similar in structure to the Pod and Service objects, which also use the default processor)

@@ -20,8 +20,10 @@ func NewIndexerInformer(lw cache.ListerWatcher, objType runtime.Object, h cache.
return clientState, cache.New(cfg)
}

// DefaultProcessor is a copy of Process function from cache.NewIndexerInformer except it does a conversion.
func DefaultProcessor(convert ToFunc) ProcessorBuilder {
type recordLatencyFunc func(interface{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty interface? why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I think I could tighten it down to meta.Object, which is implemented by all k8s api objects (e.g. Service, Pod, Endpoints, )...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could tighten it down to meta.Object

Done in 8aabd9c

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@chrisohaver
Copy link
Member Author

@johnbelamaric PTAL

@johnbelamaric
Copy link
Member

/lgtm

Copy link

@corbot corbot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by johnbelamaric

@chrisohaver chrisohaver merged commit d902e85 into coredns:master Jun 15, 2020
@luks
Copy link

luks commented Jun 15, 2020

Hi guys, i am wonder if this fix will be in release 1.7.0, we are having same problem, we have also coredns separated on masters node outside of k8s.

@chrisohaver
Copy link
Member Author

Hi guys, i am wonder if this fix will be in release 1.7.0, we are having same problem, we have also coredns separated on masters node outside of k8s.

Yes. It will.

@luks
Copy link

luks commented Jun 15, 2020 via email

nyodas pushed a commit to DataDog/coredns that referenced this pull request Oct 26, 2020
* fix tombstone unwrapping

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@chrisohaver chrisohaver deleted the fix-tombstones branch January 9, 2021 14:42
aanm added a commit to aanm/cilium that referenced this pull request Sep 28, 2021
coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>
jibi pushed a commit to cilium/cilium that referenced this pull request Sep 29, 2021
coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>
glibsm pushed a commit to cilium/cilium that referenced this pull request Oct 4, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
glibsm pushed a commit to cilium/cilium that referenced this pull request Oct 4, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
glibsm pushed a commit to cilium/cilium that referenced this pull request Oct 4, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
errordeveloper pushed a commit to cilium/cilium that referenced this pull request Oct 7, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
errordeveloper pushed a commit to cilium/cilium that referenced this pull request Oct 7, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
glibsm pushed a commit to cilium/cilium that referenced this pull request Oct 8, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
errordeveloper pushed a commit to cilium/cilium that referenced this pull request Oct 11, 2021
[ upstream commit f6f2406 ]

coredns < 1.7.0 has a bug that makes the services resolution to become
out-of-sync with the last state from Kubernetes in case coredns
suffers from a disconnection with kube-apiserver [1]. This bug is fixed on
all versions equal and above 1.7.0. [2]

In our CI this affects all Kubernetes jobs 1.18 and below and can result
in flaky tests that have the result in the following similar logs:

```
service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225)
```

[1] coredns/coredns#3587
[2] coredns/coredns#3924

Signed-off-by: André Martins <andre@cilium.io>

Signed-off-by: Glib Smaga <code@gsmaga.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants