Bugs 1439142, 1417641 - network diagnostics fixes #14364

pravisankar · 2017-05-25T20:08:01Z

Bug 1439142 - Use openshift/origin-deployer image instead of openshift/hello-openshift as network diagnostic test pod.

Bug 1417641 - Make network diagnostic test pod image/protocol/port configurable

pravisankar · 2017-05-25T20:10:22Z

@openshift/networking @sosiouxme PTAL

knobunc

Both commits LGTM. Thanks

knobunc · 2017-05-26T18:59:24Z

[test] not that it should catch anything.

knobunc · 2017-05-26T19:03:00Z

https://bugzilla.redhat.com/show_bug.cgi?id=1439142 and https://bugzilla.redhat.com/show_bug.cgi?id=1417641

dcbw · 2017-05-27T12:36:12Z

LGTM, but the jenkins failures are real and looks like you need to update the bash command completions too.

pravisankar · 2017-05-29T07:00:15Z

[test]

sosiouxme · 2017-05-30T20:11:41Z

@pravisankar I deployed a 2-node origin cluster and ran the check successfully. Then I tried running it specifying an OSE image:

oc adm diagnostics NetworkCheck --network-test-pod-image=registry.access.redhat.com/openshift3/ose-deployer

This gave me:

       [Note] Running diagnostic: CheckPodNetwork
              Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa.
              
       WARN:  [DPodNet1005 from diagnostic CheckPodNetwork@openshift/origin/pkg/diagnostics/networkpod/pod.go:125]
              Skipping pod connectivity test for global projects on the same node. Reason: Couldn't find 2 global pods.
              
       WARN:  [DPodNet1005 from diagnostic CheckPodNetwork@openshift/origin/pkg/diagnostics/networkpod/pod.go:125]
              Skipping pod connectivity test for global projects on different nodes. Reason: Couldn't find 2 global pods.

This isn't as informative as I'd like. Docker pulled the image, so I think that part worked fine, but what happened when it tried to run? Why didn't the test pods run? Did I specify something wrong or is this a system problem?

…t/hello-openshift as network diagnostic test pod. openshift/hello-openshift is not available in redhat registry. Now we use openshift/origin-deployer which already exists in redhat registry and should be present on all nodes.

…nfigurable This will be helpful in offline environments.

pravisankar · 2017-05-30T22:46:22Z

@sosiouxme
oc adm diagnostics NetworkCheck --network-test-pod-image=registry.access.redhat.com/openshift3/ose-deployer failed because it is not listening on default test pod port 8080. I agree that there is no useful warning or info displayed on the terminal.

pravisankar · 2017-05-30T22:55:35Z

@knobunc @dcbw @sosiouxme

Updated, added couple of changes:

On OCP, defaultImagePrefix will be 'openshift3/ose-' and it will not match full path 'registry.access.redhat.com/openshift3/ose-'. Now, we sanitize the test pod image before comparing with default diagnostics test pod image.
When network diagnostic test pods fails, you will notice an error or warning.

pravisankar · 2017-05-31T17:19:39Z

flake #14122
re[test]

openshift-bot · 2017-05-31T17:21:27Z

Evaluated for origin test up to 9a8b1c6

openshift-bot · 2017-05-31T19:06:15Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1874/) (Base Commit: 375c727)

knobunc · 2017-05-31T19:18:06Z

[merge][severity:blocker]

openshift-bot · 2017-05-31T19:21:30Z

Evaluated for origin merge up to 9a8b1c6

sosiouxme

So other than these items, LGTM. And I don't consider those blockers, but would be nice to have. I'd like to see the output more informative in general about what the results mean and what the user might do about any problems, but that's not changed with this PR.

sosiouxme · 2017-05-31T19:43:23Z

pkg/diagnostics/network/objects.go

+	} else {
+		trimmedPodImage = imageTokens[n-2] + "/" + imageTokens[n-1]
+	}
+	if trimmedPodImage == util.NetworkDiagDefaultTestPodImage {


Ah, I thought it didn't much matter what the image was as it is just for running socat. But I see here the test pod only runs socat if the image has the right name. Why? It doesn't look like GetTestPod is used for anything else. And even if it had an alternate use, why not just pass in another boolean "runSocat" instead of try to match the name? It seems less surprising, and no reason this shouldn't work if they pass in openshift/origin or some other image that has socat on it.

Requirement for our network diagnostic test pod is to serve/listen for requests using tcp/udp on some port. We don't care if the test pod is running socat or simple http web server. We chose socat because openshift/origin-deployer doesn't serve by default and we picked this image for test pod because it already exists in redhat registry and also available on all the nodes.
For non default test pod passed by user we expect it be serving by default.

sosiouxme · 2017-05-31T20:18:12Z

pkg/diagnostics/network/setup.go

 				return nil
+			} else {
+				errList = append(errList, fmt.Errorf("Failed to run network diags test pods, failed: %d, total: %d", (totalPods-runningPods), totalPods))


It would be really nice here if it could report on what happened with the ones that aren't running.

Yes, We could pull the container logs in case of failures.

sosiouxme · 2017-05-31T20:19:45Z

pkg/diagnostics/network/setup.go

-					break
-				}
+			if pod.Status.Phase == kapi.PodRunning {
+				runningPodCount += 1


} else { // build a list of pod state and any errors that can be retrieved for those that failed

openshift-bot · 2017-05-31T21:21:37Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/858/) (Base Commit: fb5ea52) (Extended Tests: blocker) (Image: devenv-rhel7_6290)

pravisankar added the component/diagnostics label May 25, 2017

sosiouxme mentioned this pull request May 25, 2017

oc adm diagnostics #13741

Closed

knobunc approved these changes May 26, 2017

View reviewed changes

knobunc assigned knobunc and unassigned knobunc May 26, 2017

knobunc requested a review from sosiouxme May 26, 2017 19:02

knobunc assigned knobunc and pravisankar May 26, 2017

pravisankar force-pushed the diags-use-diff-image branch from f5af84c to 1eae689 Compare May 29, 2017 06:56

Ravi Sankar Penta added 2 commits May 30, 2017 13:23

Bug 1417641 - Make network diagnostic test pod image/protocol/port co…

66ff46e

…nfigurable This will be helpful in offline environments.

pravisankar force-pushed the diags-use-diff-image branch from 1eae689 to 5fd94c6 Compare May 30, 2017 22:41

Ravi Sankar Penta added 2 commits May 30, 2017 19:30

Provide better error message when network diags test pod fails

a4018ef

Auto generated: bash completions for network diagnostics

9a8b1c6

pravisankar force-pushed the diags-use-diff-image branch from 5fd94c6 to 9a8b1c6 Compare May 31, 2017 02:31

sosiouxme reviewed May 31, 2017

View reviewed changes

openshift-bot merged commit da47359 into openshift:master May 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs 1439142, 1417641 - network diagnostics fixes #14364

Bugs 1439142, 1417641 - network diagnostics fixes #14364

pravisankar commented May 25, 2017

pravisankar commented May 25, 2017

knobunc left a comment

knobunc commented May 26, 2017

knobunc commented May 26, 2017

dcbw commented May 27, 2017

pravisankar commented May 29, 2017

sosiouxme commented May 30, 2017

pravisankar commented May 30, 2017

pravisankar commented May 30, 2017

pravisankar commented May 31, 2017

openshift-bot commented May 31, 2017

openshift-bot commented May 31, 2017

knobunc commented May 31, 2017

openshift-bot commented May 31, 2017

sosiouxme left a comment

sosiouxme May 31, 2017

pravisankar Jun 1, 2017

sosiouxme May 31, 2017

pravisankar Jun 1, 2017

sosiouxme May 31, 2017

openshift-bot commented May 31, 2017 •

edited

Loading

Bugs 1439142, 1417641 - network diagnostics fixes #14364

Bugs 1439142, 1417641 - network diagnostics fixes #14364

Conversation

pravisankar commented May 25, 2017

pravisankar commented May 25, 2017

knobunc left a comment

Choose a reason for hiding this comment

knobunc commented May 26, 2017

knobunc commented May 26, 2017

dcbw commented May 27, 2017

pravisankar commented May 29, 2017

sosiouxme commented May 30, 2017

pravisankar commented May 30, 2017

pravisankar commented May 30, 2017

pravisankar commented May 31, 2017

openshift-bot commented May 31, 2017

openshift-bot commented May 31, 2017

knobunc commented May 31, 2017

openshift-bot commented May 31, 2017

sosiouxme left a comment

Choose a reason for hiding this comment

sosiouxme May 31, 2017

Choose a reason for hiding this comment

pravisankar Jun 1, 2017

Choose a reason for hiding this comment

sosiouxme May 31, 2017

Choose a reason for hiding this comment

pravisankar Jun 1, 2017

Choose a reason for hiding this comment

sosiouxme May 31, 2017

Choose a reason for hiding this comment

openshift-bot commented May 31, 2017 • edited Loading

openshift-bot commented May 31, 2017 •

edited

Loading