-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs 1439142, 1417641 - network diagnostics fixes #14364
Bugs 1439142, 1417641 - network diagnostics fixes #14364
Conversation
@openshift/networking @sosiouxme PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both commits LGTM. Thanks
[test] not that it should catch anything. |
LGTM, but the jenkins failures are real and looks like you need to update the bash command completions too. |
f5af84c
to
1eae689
Compare
[test] |
@pravisankar I deployed a 2-node origin cluster and ran the check successfully. Then I tried running it specifying an OSE image:
This gave me:
This isn't as informative as I'd like. Docker pulled the image, so I think that part worked fine, but what happened when it tried to run? Why didn't the test pods run? Did I specify something wrong or is this a system problem? |
…t/hello-openshift as network diagnostic test pod. openshift/hello-openshift is not available in redhat registry. Now we use openshift/origin-deployer which already exists in redhat registry and should be present on all nodes.
…nfigurable This will be helpful in offline environments.
1eae689
to
5fd94c6
Compare
@sosiouxme |
Updated, added couple of changes:
|
5fd94c6
to
9a8b1c6
Compare
flake #14122 |
Evaluated for origin test up to 9a8b1c6 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1874/) (Base Commit: 375c727) |
[merge][severity:blocker] |
Evaluated for origin merge up to 9a8b1c6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So other than these items, LGTM. And I don't consider those blockers, but would be nice to have. I'd like to see the output more informative in general about what the results mean and what the user might do about any problems, but that's not changed with this PR.
} else { | ||
trimmedPodImage = imageTokens[n-2] + "/" + imageTokens[n-1] | ||
} | ||
if trimmedPodImage == util.NetworkDiagDefaultTestPodImage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought it didn't much matter what the image was as it is just for running socat
. But I see here the test pod only runs socat
if the image has the right name. Why? It doesn't look like GetTestPod
is used for anything else. And even if it had an alternate use, why not just pass in another boolean "runSocat" instead of try to match the name? It seems less surprising, and no reason this shouldn't work if they pass in openshift/origin
or some other image that has socat on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requirement for our network diagnostic test pod is to serve/listen for requests using tcp/udp on some port. We don't care if the test pod is running socat
or simple http web server. We chose socat
because openshift/origin-deployer
doesn't serve by default and we picked this image for test pod because it already exists in redhat registry and also available on all the nodes.
For non default test pod passed by user we expect it be serving by default.
return nil | ||
} else { | ||
errList = append(errList, fmt.Errorf("Failed to run network diags test pods, failed: %d, total: %d", (totalPods-runningPods), totalPods)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really nice here if it could report on what happened with the ones that aren't running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, We could pull the container logs in case of failures.
break | ||
} | ||
if pod.Status.Phase == kapi.PodRunning { | ||
runningPodCount += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { // build a list of pod state and any errors that can be retrieved for those that failed
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/858/) (Base Commit: fb5ea52) (Extended Tests: blocker) (Image: devenv-rhel7_6290) |
Bug 1439142 - Use openshift/origin-deployer image instead of openshift/hello-openshift as network diagnostic test pod.
Bug 1417641 - Make network diagnostic test pod image/protocol/port configurable