Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs 1439142, 1417641 - network diagnostics fixes #14364

Merged
merged 4 commits into from
May 31, 2017

Conversation

pravisankar
Copy link

Bug 1439142 - Use openshift/origin-deployer image instead of openshift/hello-openshift as network diagnostic test pod.

Bug 1417641 - Make network diagnostic test pod image/protocol/port configurable

@pravisankar
Copy link
Author

@openshift/networking @sosiouxme PTAL

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both commits LGTM. Thanks

@knobunc
Copy link
Contributor

knobunc commented May 26, 2017

[test] not that it should catch anything.

@knobunc knobunc assigned knobunc and unassigned knobunc May 26, 2017
@knobunc knobunc requested a review from sosiouxme May 26, 2017 19:02
@knobunc
Copy link
Contributor

knobunc commented May 26, 2017

@dcbw
Copy link
Contributor

dcbw commented May 27, 2017

LGTM, but the jenkins failures are real and looks like you need to update the bash command completions too.

@pravisankar pravisankar force-pushed the diags-use-diff-image branch from f5af84c to 1eae689 Compare May 29, 2017 06:56
@pravisankar
Copy link
Author

[test]

@sosiouxme
Copy link
Member

@pravisankar I deployed a 2-node origin cluster and ran the check successfully. Then I tried running it specifying an OSE image:

oc adm diagnostics NetworkCheck --network-test-pod-image=registry.access.redhat.com/openshift3/ose-deployer

This gave me:

       [Note] Running diagnostic: CheckPodNetwork
              Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa.
              
       WARN:  [DPodNet1005 from diagnostic CheckPodNetwork@openshift/origin/pkg/diagnostics/networkpod/pod.go:125]
              Skipping pod connectivity test for global projects on the same node. Reason: Couldn't find 2 global pods.
              
       WARN:  [DPodNet1005 from diagnostic CheckPodNetwork@openshift/origin/pkg/diagnostics/networkpod/pod.go:125]
              Skipping pod connectivity test for global projects on different nodes. Reason: Couldn't find 2 global pods.

This isn't as informative as I'd like. Docker pulled the image, so I think that part worked fine, but what happened when it tried to run? Why didn't the test pods run? Did I specify something wrong or is this a system problem?

Ravi Sankar Penta added 2 commits May 30, 2017 13:23
…t/hello-openshift as network diagnostic test pod.

openshift/hello-openshift is not available in redhat registry.
Now we use openshift/origin-deployer which already exists in redhat registry
and should be present on all nodes.
…nfigurable

This will be helpful in offline environments.
@pravisankar pravisankar force-pushed the diags-use-diff-image branch from 1eae689 to 5fd94c6 Compare May 30, 2017 22:41
@pravisankar
Copy link
Author

@sosiouxme
oc adm diagnostics NetworkCheck --network-test-pod-image=registry.access.redhat.com/openshift3/ose-deployer failed because it is not listening on default test pod port 8080. I agree that there is no useful warning or info displayed on the terminal.

@pravisankar
Copy link
Author

@knobunc @dcbw @sosiouxme

Updated, added couple of changes:

  • On OCP, defaultImagePrefix will be 'openshift3/ose-' and it will not match full path 'registry.access.redhat.com/openshift3/ose-'. Now, we sanitize the test pod image before comparing with default diagnostics test pod image.
  • When network diagnostic test pods fails, you will notice an error or warning.

@pravisankar pravisankar force-pushed the diags-use-diff-image branch from 5fd94c6 to 9a8b1c6 Compare May 31, 2017 02:31
@pravisankar
Copy link
Author

flake #14122
re[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 9a8b1c6

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1874/) (Base Commit: 375c727)

@knobunc
Copy link
Contributor

knobunc commented May 31, 2017

[merge][severity:blocker]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 9a8b1c6

Copy link
Member

@sosiouxme sosiouxme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So other than these items, LGTM. And I don't consider those blockers, but would be nice to have. I'd like to see the output more informative in general about what the results mean and what the user might do about any problems, but that's not changed with this PR.

} else {
trimmedPodImage = imageTokens[n-2] + "/" + imageTokens[n-1]
}
if trimmedPodImage == util.NetworkDiagDefaultTestPodImage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought it didn't much matter what the image was as it is just for running socat. But I see here the test pod only runs socat if the image has the right name. Why? It doesn't look like GetTestPod is used for anything else. And even if it had an alternate use, why not just pass in another boolean "runSocat" instead of try to match the name? It seems less surprising, and no reason this shouldn't work if they pass in openshift/origin or some other image that has socat on it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirement for our network diagnostic test pod is to serve/listen for requests using tcp/udp on some port. We don't care if the test pod is running socat or simple http web server. We chose socat because openshift/origin-deployer doesn't serve by default and we picked this image for test pod because it already exists in redhat registry and also available on all the nodes.
For non default test pod passed by user we expect it be serving by default.

return nil
} else {
errList = append(errList, fmt.Errorf("Failed to run network diags test pods, failed: %d, total: %d", (totalPods-runningPods), totalPods))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice here if it could report on what happened with the ones that aren't running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, We could pull the container logs in case of failures.

break
}
if pod.Status.Phase == kapi.PodRunning {
runningPodCount += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else { // build a list of pod state and any errors that can be retrieved for those that failed

@openshift-bot
Copy link
Contributor

openshift-bot commented May 31, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/858/) (Base Commit: fb5ea52) (Extended Tests: blocker) (Image: devenv-rhel7_6290)

@openshift-bot openshift-bot merged commit da47359 into openshift:master May 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants