Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition for consuming podIP via downward API #13052

Merged
merged 1 commit into from
Sep 2, 2015

Conversation

pmorie
Copy link
Member

@pmorie pmorie commented Aug 21, 2015

Fixes the race that can make status.podIP be blank when consumed via the downward API.

@thockin @smarterclayton PTAL

return "", err
}
if inspectResult == nil {
return "", nil // TODO: should be an error, probably
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these errors, then the caller can choose to ignore the error. If you can't get a network IP it's better to continue with no status IP. Don't log in this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if clause is bothering me; inspectContainer does it but it seems like it would represent a bug in the docker API to get an (empty container, nil) result from InspectContainer. It would be great if there were a comment on the original code with a reference to an issue either in kube or docker, but oh well. I'm strongly tempted to take this clause out...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write this code where you can tolerate not having the podIP, but you should
not ignore any errors in this function.

On Fri, Aug 21, 2015 at 4:42 PM, Paul Morie notifications@github.com
wrote:

In pkg/kubelet/dockertools/manager.go
#13052 (comment)
:

@@ -276,6 +276,38 @@ type containerStatusResult struct {
err error
}

+// determineInfraContainerIP inspects the given container and determines the IP address of that
+// container. The container is assumed to be the pod infra container; it is the responsibility
+// of the caller to ensure that the container ID passed is that of the infra container.
+func (dm *DockerManager) determineInfraContainerIP(dockerID string, pod *api.Pod) (string, error) {

  • inspectResult, err := dm.client.InspectContainer(dockerID)
  • if err != nil {
  •   return "", err
    
  • }
  • if inspectResult == nil {
  •   return "", nil // TODO: should be an error, probably
    

This if clause is bothering me; inspectContainer does it but it seems
like it would represent a bug in the docker API to get an (empty
container, nil) result from InspectContainer. It would be great if there
were a comment on the original code with a reference to an issue either in
kube or docker, but oh well. I'm strongly tempted to take this clause out...


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/13052/files#r37676100.

Clayton Coleman | Lead Engineer, OpenShift

@k8s-bot
Copy link

k8s-bot commented Aug 21, 2015

GCE e2e build/test passed for commit 82ba717ce35deb4db96d35c454342e8b2d18a632.

if dm.networkPlugin.Name() != network.DefaultPluginName {
netStatus, err := dm.networkPlugin.Status(pod.Namespace, pod.Name, kubeletTypes.DockerID(dockerID))
if err != nil {
glog.Errorf("NetworkPlugin %s failed on the status hook for pod '%s' - %v", dm.networkPlugin.Name(), pod.Name, err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering to myself if this should be an error in this and inspectContainer, which I cribbed this code from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, spawn an issue

On Fri, Aug 21, 2015 at 4:35 PM, Paul Morie notifications@github.com
wrote:

In pkg/kubelet/dockertools/manager.go
#13052 (comment)
:

  • }
  • if inspectResult == nil {
  •   return "", nil // TODO: should be an error, probably
    
  • }
  • result := ""
  • if inspectResult.State.Running {
  •   if inspectResult.NetworkSettings != nil {
    
  •       result = inspectResult.NetworkSettings.IPAddress
    
  •   }
    
  •   if dm.networkPlugin.Name() != network.DefaultPluginName {
    
  •       netStatus, err := dm.networkPlugin.Status(pod.Namespace, pod.Name, kubeletTypes.DockerID(dockerID))
    
  •       if err != nil {
    
  •           glog.Errorf("NetworkPlugin %s failed on the status hook for pod '%s' - %v", dm.networkPlugin.Name(), pod.Name, err)
    

I'm wondering to myself if this should be an error in this and
inspectContainer, which I cribbed this code from.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/13052/files#r37675321.

Clayton Coleman | Lead Engineer, OpenShift

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawned #13055

@pmorie pmorie force-pushed the podip-fix branch 2 times, most recently from 1f1c4da to 2e92ea0 Compare August 21, 2015 22:16
@k8s-bot
Copy link

k8s-bot commented Aug 21, 2015

GCE e2e build/test failed for commit 2e92ea03d6048068c7207a01e2310970e9e9681d.

@k8s-bot
Copy link

k8s-bot commented Aug 21, 2015

GCE e2e build/test failed for commit 1f1c4da84887f8e2217824cfead1de0113797399.

@pmorie pmorie force-pushed the podip-fix branch 2 times, most recently from 0b111a6 to ab858a9 Compare August 22, 2015 00:35
@k8s-bot
Copy link

k8s-bot commented Aug 22, 2015

GCE e2e build/test failed for commit 0b111a621de9022b40e1297b223bb9f653889c7c.

@k8s-bot
Copy link

k8s-bot commented Aug 22, 2015

GCE e2e build/test failed for commit ab858a997bd7ac774e888845d969f6e78d1bb610.

@@ -285,16 +285,53 @@ type containerStatusResult struct {
err error
}

const podIPDownwardAPISelector = "status.podIP"

func podDependsOnPodIP(pod *api.Pod) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc

@smarterclayton
Copy link
Contributor

One nit, add this to the downward_api e2e test (please), and the existing e2e failures look like flakes caused by over scheduling.

@pmorie
Copy link
Member Author

pmorie commented Aug 24, 2015

@smarterclayton Yep, I have been working on adding it to the e2e; an additional refactor was required for that; gonna split out another PR since it touches many files.

@k8s-bot
Copy link

k8s-bot commented Aug 25, 2015

GCE e2e build/test failed for commit 6b5321d8a95e32788d81df4b1582b5868fb3804a.

@pmorie
Copy link
Member Author

pmorie commented Aug 25, 2015

Interesting, this appears to be a legit failure on GCE. Going to try to reproduce.

@k8s-bot
Copy link

k8s-bot commented Aug 25, 2015

GCE e2e build/test failed for commit b63215e3ae7f202707a95d29d28fa49725d1563d.

@pmorie
Copy link
Member Author

pmorie commented Aug 25, 2015

I couldn't recreate the failure on GCE. Kicking the test runner -- I will need to reshuffle the commits in this branch before this goes in if the GCE run finishes successfully.

@k8s-bot
Copy link

k8s-bot commented Aug 25, 2015

GCE e2e build/test failed for commit fe96aa0d7b3bcb6d7f2842b09e88b8b1fb963888.

@k8s-bot
Copy link

k8s-bot commented Sep 1, 2015

GCE e2e build/test failed for commit 4ff66bd.

@yifan-gu
Copy link
Contributor

yifan-gu commented Sep 1, 2015

@pmorie Thanks for heading up, for rkt side, as we haven't done much with the network plugin yet, will revisit later.
FWIW, this LGTM :)

@pmorie
Copy link
Member Author

pmorie commented Sep 1, 2015

@smarterclayton squashed

@smarterclayton smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2015
@pmorie
Copy link
Member Author

pmorie commented Sep 1, 2015

Thanks @yifan-gu

@smarterclayton
Copy link
Contributor

LGTM, e2e is flakes

@pmorie
Copy link
Member Author

pmorie commented Sep 2, 2015

@k8s-oncall

mwielgus added a commit that referenced this pull request Sep 2, 2015
Fix race condition for consuming podIP via downward API
@mwielgus mwielgus merged commit 3e99325 into kubernetes:master Sep 2, 2015
@pmorie
Copy link
Member Author

pmorie commented Sep 2, 2015

@mwielgus I would have been fine waiting for the bot -- I just didn't know how long to expect to have to wait since LGTM was added yesterday

@wojtek-t
Copy link
Member

wojtek-t commented Sep 8, 2015

@pmorie @smarterclayton - sorry for jumping late to the party, but it seems that this PR completely broke pod startup time latency (our scalability tests are just logging it, so I didn't catch it before - I will extend tests today to check it). Before the 99th percentile of pod statup time was less than 5 seconds, after this PR it usually exceeds 30 seconds.
I reverted it locally and it seems to help, so I'm going to revert this PR.

wojtek-t added a commit to wojtek-t/kubernetes that referenced this pull request Sep 8, 2015
@wojtek-t wojtek-t mentioned this pull request Sep 8, 2015
@smarterclayton
Copy link
Contributor

What was the 90 and 95 percentiles?

On Tue, Sep 8, 2015 at 3:31 AM, Wojciech Tyczynski <notifications@github.com

wrote:

@pmorie https://github.com/pmorie @smarterclayton
https://github.com/smarterclayton - sorry for jumping late to the
party, but it seems that this PR completely broke pod startup time latency
(our scalability tests are just logging it, so I didn't catch it before - I
will extend tests today to check it). Before the 99th percentile of pod
statup time was less than 5 seconds, after this PR it usually exceeds 30
seconds.
I reverted it locally and it seems to help, so I'm going to revert this PR.


Reply to this email directly or view it on GitHub
#13052 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@wojtek-t
Copy link
Member

wojtek-t commented Sep 8, 2015

I didn't look for 95th percentile, but there was also significant difference (5-10x) in 50th and 90th percentile.

@wojtek-t
Copy link
Member

wojtek-t commented Sep 8, 2015

So it wasn't only the long tail - it was just significantly worse at all percentiles.

@smarterclayton
Copy link
Contributor

Ok, sounds like then we probably need to reuse the data we get from Docker
when we create the pod infra container instead of calling list again. Does
the scalability tests run with any network plugins (assume no)?

On Tue, Sep 8, 2015 at 10:00 AM, Wojciech Tyczynski <
notifications@github.com> wrote:

So it wasn't only the long tail - it was just significantly worse at all
percentiles.


Reply to this email directly or view it on GitHub
#13052 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@wojtek-t
Copy link
Member

wojtek-t commented Sep 8, 2015

We are running scalability tests on GCE (without any additional plugins).

@pmorie
Copy link
Member Author

pmorie commented Sep 8, 2015

@smarterclayton @wojtek-t It's extremely curious that this would have affected performance that much. This pad doesn't add any new inspect calls, @smarterclayton; it already uses the result of the docker inspect performed when we started the infra container.

@smarterclayton
Copy link
Contributor

All I can imagine is that the network plugin is being called, or some error
condition is being triggered that causes a bad code path to be executed
that we didn't spot while testing.

On Tue, Sep 8, 2015 at 2:11 PM, Paul Morie notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton @wojtek-t
https://github.com/wojtek-t It's extremely curious that this would have
affected performance that much. This pad doesn't add any new inspect calls,
@smarterclayton https://github.com/smarterclayton; it already uses the
result of the docker inspect performed when we started the infra container.


Reply to this email directly or view it on GitHub
#13052 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants