-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1alpha2] Create a simple python server to be used for E2E tests of controller behavior #653
Labels
Comments
jlewi
added a commit
to jlewi/k8s
that referenced
this issue
Jun 14, 2018
* Only the tests for v1alpha1 are enabled. A follow on PR will see if v1alpha2 is working and enable the tests for v1alpha2. * Fix versionTag logic; we need to allow for case where versionTag is an * To facilitate these E2E tests, we create a test server to be run as inside the replicas. This server allows us to control what the process does via RPC. This allows the test runner to control when a replica exits. * Test harness needs to route requests through the APIServer proxy * Events no longer appears to be showing up for all services / pods even though all services pods are being created. So we turn the failure into a warning instead of a test failure. * Print out the TFJob spec and events to aid debugging test failures. Fix kubeflow#653 test server Fixes: kubeflow#235 E2E test case for when chief is worker 0 Related: kubeflow#589 CI for v1alpha2
jlewi
added a commit
to jlewi/k8s
that referenced
this issue
Jun 14, 2018
* Only the tests for v1alpha1 are enabled. A follow on PR will see if v1alpha2 is working and enable the tests for v1alpha2. * Fix versionTag logic; we need to allow for case where versionTag is an * To facilitate these E2E tests, we create a test server to be run as inside the replicas. This server allows us to control what the process does via RPC. This allows the test runner to control when a replica exits. * Test harness needs to route requests through the APIServer proxy * Events no longer appears to be showing up for all services / pods even though all services pods are being created. So we turn the failure into a warning instead of a test failure. * Print out the TFJob spec and events to aid debugging test failures. Fix kubeflow#653 test server Fixes: kubeflow#235 E2E test case for when chief is worker 0 Related: kubeflow#589 CI for v1alpha2
k8s-ci-robot
pushed a commit
that referenced
this issue
Jun 14, 2018
* Add E2E tests that verify termination policy is handled correctly. * Only the tests for v1alpha1 are enabled. A follow on PR will see if v1alpha2 is working and enable the tests for v1alpha2. * Fix versionTag logic; we need to allow for case where versionTag is an * To facilitate these E2E tests, we create a test server to be run as inside the replicas. This server allows us to control what the process does via RPC. This allows the test runner to control when a replica exits. * Test harness needs to route requests through the APIServer proxy * Events no longer appears to be showing up for all services / pods even though all services pods are being created. So we turn the failure into a warning instead of a test failure. * Print out the TFJob spec and events to aid debugging test failures. Fix #653 test server Fixes: #235 E2E test case for when chief is worker 0 Related: #589 CI for v1alpha2 * * Fix bug in wait for pods; we were exiting prematurely * Fix bug in getting message from event.
yph152
pushed a commit
to yph152/tf-operator
that referenced
this issue
Jun 18, 2018
* Add E2E tests that verify termination policy is handled correctly. * Only the tests for v1alpha1 are enabled. A follow on PR will see if v1alpha2 is working and enable the tests for v1alpha2. * Fix versionTag logic; we need to allow for case where versionTag is an * To facilitate these E2E tests, we create a test server to be run as inside the replicas. This server allows us to control what the process does via RPC. This allows the test runner to control when a replica exits. * Test harness needs to route requests through the APIServer proxy * Events no longer appears to be showing up for all services / pods even though all services pods are being created. So we turn the failure into a warning instead of a test failure. * Print out the TFJob spec and events to aid debugging test failures. Fix kubeflow#653 test server Fixes: kubeflow#235 E2E test case for when chief is worker 0 Related: kubeflow#589 CI for v1alpha2 * * Fix bug in wait for pods; we were exiting prematurely * Fix bug in getting message from event.
jetmuffin
pushed a commit
to jetmuffin/tf-operator
that referenced
this issue
Jul 9, 2018
* Add E2E tests that verify termination policy is handled correctly. * Only the tests for v1alpha1 are enabled. A follow on PR will see if v1alpha2 is working and enable the tests for v1alpha2. * Fix versionTag logic; we need to allow for case where versionTag is an * To facilitate these E2E tests, we create a test server to be run as inside the replicas. This server allows us to control what the process does via RPC. This allows the test runner to control when a replica exits. * Test harness needs to route requests through the APIServer proxy * Events no longer appears to be showing up for all services / pods even though all services pods are being created. So we turn the failure into a warning instead of a test failure. * Print out the TFJob spec and events to aid debugging test failures. Fix kubeflow#653 test server Fixes: kubeflow#235 E2E test case for when chief is worker 0 Related: kubeflow#589 CI for v1alpha2 * * Fix bug in wait for pods; we were exiting prematurely * Fix bug in getting message from event.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We'd like to write more comprehensive E2E test cases to verify the controller works as expected.
We have a number of issues related to adding more tests
#646 - Add tests for termination behavior
#651 - Add Evaluator tests
Currently our E2E tests are based on [smoke_test.py]https://github.com/kubeflow/tf-operator/blob/master/examples/tf_sample/tf_sample/tf_smoke.py). This verifies that ops can be assigned to different devices. This is a good starting point for testing we can start TF servers and that they can communicate with one another.
Its not a good test for testing controller behavior. For controller behavior the things we'd like to test are
I think a better approach to writing these tests would be to run in each TF replica a simply python server that exposed simple handlers like "/restart", "/continue", "/get_tf_config" that would allow the test harness to control the behavior of the process. This would make it much easier for the test_harness to simulate certain conditions and verify the controller works as expected.
Tests like tf_smoke.py could still be useful for verifying that TF_CONFIG and TF libraries work together.
I think we should start by creating a server suitable for testing pod restart behavior. So we need the following
The text was updated successfully, but these errors were encountered: