Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST introducing the random_seed fixture #22749

Merged
merged 21 commits into from
Mar 14, 2022

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Mar 10, 2022

Closes #13913

This is a new fixture similar to what is being developed in #22690 that will make it possible to ensure that (some of) our tests do no rely on a specific value of a random seed while giving full control to make it possible to reproduce CI runs locally or to make it possible to use

The docstring of the fixture should be quite explicit. I tested it locally and it seems to work as expected:

dev ❯ SKLEARN_TESTS_GLOBAL_RANDOM_SEED='0-2' pytest -v -k "test_kmeans_elkan_results and 0.01-dense-normal" sklearn/cluster/tests/
======================================================================================== test session starts =========================================================================================
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /Users/ogrisel/mambaforge/envs/dev/bin/python
cachedir: .pytest_cache
To reproduce this test run, set the following environment variable:
    SKLEARN_TESTS_GLOBAL_RANDOM_SEED="0-2"
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: anyio-3.3.0, xdist-2.3.0, timeout-1.4.2, forked-1.3.0, cov-3.0.0
collected 548 items / 545 deselected / 3 selected                                                                                                                                                    

sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[0-0.01-dense-normal] PASSED                                                                                                   [ 33%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[1-0.01-dense-normal] PASSED                                                                                                   [ 66%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[2-0.01-dense-normal] PASSED                                                                                                   [100%]

================================================================================= 3 passed, 545 deselected in 0.25s ==================================================================================

dev ❯ SKLEARN_TESTS_GLOAL_RANDOM_SEED='42' pytest -v -k "test_kmeans_elkan_results and 0.01-dense-normal" sklearn/cluster/tests/
======================================================================================== test session starts =========================================================================================
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /Users/ogrisel/mambaforge/envs/dev/bin/python
cachedir: .pytest_cache
To reproduce this test run, set the following environment variable:
    SKLEARN_TESTS_GLOAL_RANDOM_SEED="42"
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: anyio-3.3.0, xdist-2.3.0, timeout-1.4.2, forked-1.3.0, cov-3.0.0
collected 516 items / 515 deselected / 1 selected                                                                                                                                                    

sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[42-0.01-dense-normal] PASSED                                                                                                  [100%]

================================================================================= 1 passed, 515 deselected in 0.19s ==================================================================================

dev ❯ pytest -v -k "test_kmeans_elkan_results and 0.01-dense-normal" sklearn/cluster/tests/
======================================================================================== test session starts =========================================================================================
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /Users/ogrisel/mambaforge/envs/dev/bin/python
cachedir: .pytest_cache
To reproduce this test run, set the following environment variable:
    SKLEARN_TESTS_GLOAL_RANDOM_SEED="94"
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: anyio-3.3.0, xdist-2.3.0, timeout-1.4.2, forked-1.3.0, cov-3.0.0
collected 516 items / 515 deselected / 1 selected                                                                                                                                                    

sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[94-0.01-dense-normal] PASSED                                                                                                  [100%]

================================================================================= 1 passed, 515 deselected in 0.22s ==================================================================================

dev ❯ SKLEARN_TESTS_GLOAL_RANDOM_SEED='all' pytest -v -k "test_kmeans_elkan_results and 0.01-dense-normal" sklearn/cluster/tests/
======================================================================================== test session starts =========================================================================================
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /Users/ogrisel/mambaforge/envs/dev/bin/python
cachedir: .pytest_cache
To reproduce this test run, set the following environment variable:
    SKLEARN_TESTS_GLOAL_RANDOM_SEED="all"
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: anyio-3.3.0, xdist-2.3.0, timeout-1.4.2, forked-1.3.0, cov-3.0.0
collected 2100 items / 2000 deselected / 100 selected                                                                                                                                                

sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[0-0.01-dense-normal] PASSED                                                                                                   [  1%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[1-0.01-dense-normal] PASSED                                                                                                   [  2%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[2-0.01-dense-normal] PASSED                                                                                                   [  3%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[3-0.01-dense-normal] PASSED                                                                                                   [  4%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[4-0.01-dense-normal] PASSED                                                                                                   [  5%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[5-0.01-dense-normal] PASSED                                                                                                   [  6%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[6-0.01-dense-normal] PASSED                                                                                                   [  7%]
[...]

TODO:

  • find out if we can make the test pass even if the plugin is not installed or maybe force the activation of the plugin in sklearn/conftests.py?
  • document the new environment variable;
  • decide and implement the renaming of the fixture based on the discussion in TST introducing the random_seed fixture #22749 (comment)

@ogrisel ogrisel changed the title TST introduce the random_seed fixture TST introducing the random_seed fixture Mar 10, 2022
@ogrisel
Copy link
Member Author

ogrisel commented Mar 10, 2022

If we agree that this is a good idea, I will open a meta-issue to progressively convert existing tests to use that fixture.

Each time that will require running the test locally with:

SKLEARN_TESTS_RANDOM_SEED="all" pytest -v -k test_your_test_name

if all seeds pass, then fine it's an easy PR.

If it it's not the case, it will be revealing that the test is too brittle to use this fixture. This will require investigating if the test can be more robust by changing some hyper-parameters or increasing the size of a dataset or similar.

If it's still too hard to make the test pass for any admissible seed, then we might want to keep it with a fixed seed but add a comment to explain that this test to acknowledge that this test is quite sensitive to the specific choice of the seed and move on to the next test.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 10, 2022

Hum, we have a bad interaction between the use of pytest-xdist and this new fixture:

Different tests were collected between gw0 and gw1. The difference is:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=39225&view=logs&j=dde5042c-7464-5d47-9507-31bdd2ee0a3a&t=4bd2dad8-62b3-5bf9-08a5-a9880c530c94&l=205

Instead of picking a seed completely at random, I will have to make it depend on the year and the day for instance.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 10, 2022

I think I chose the right day to write this PR:

>>> from random import Random
>>> from datetime import datetime
>>> RANDOM_SEED_RANGE = list(range(100))
... rng = Random(int(datetime.now().strftime("%Y%j")))
... rng.choice(RANDOM_SEED_RANGE)
42

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pytest.fixture all the tests!

sklearn/conftest.py Outdated Show resolved Hide resolved
sklearn/conftest.py Outdated Show resolved Hide resolved
sklearn/conftest.py Outdated Show resolved Hide resolved
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
sklearn/conftest.py Outdated Show resolved Hide resolved
sklearn/conftest.py Outdated Show resolved Hide resolved
sklearn/conftest.py Outdated Show resolved Hide resolved
sklearn/conftest.py Outdated Show resolved Hide resolved
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the CI passes.

@jeremiedbb
Copy link
Member

Great, the report header shows up on azure ! but now circleci breaks :'(

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

Indeed I was afraid this might happen. It will also break when people run the tests from outside of the source folder, e.g. for instance when testing the installed wheels in [cd build] runs.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

I have edited the description of the PR with a TODO list of things to finalize before considering the merge of this PR.

@thomasjpfan
Copy link
Member

I think the the random seed should be fixed by default. With a non-fixed seed, tests in PRs can fail randomly because a different seed is used.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

I think the the random seed should be fixed by default. With a non-fixed seed, tests in PRs can fail randomly because a different seed is used.

This is the goal of this PR: to make sure that our CI explores over time all the seeds that we are expected to support so as to make sure that we will not have regressions or test code updates that reintroduce seed-specificity by mistake without realizing it.

If that happens, it's easy to read the log of the CI to identify that test and run it locally to with the SKLEARN_TESTS_GLOAL_RANDOM_SEED="all" variable to make it seed insensitive again.

@jeremiedbb
Copy link
Member

I would not want PRs to have failing tests because of a poor random seed that is unrelated to the PR itself.

That should never happen since we will add the fixture to a test iff the test passes when setting "all". Then, if a PR does not touch the test or the code that this test cover, it wont fail on any seed.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

I would not want PRs to have failing tests because of a poor random seed that is unrelated to the PR itself. That feels like a poor contributor experience.

Indeed that's a valid point.

The other issue with non-fixed seed is when an external packager such as debian runs our tests and happens to use a seed that fails. A non-fixed seed can also block our own release process, when wheel building failed because of a random seed.

Also a valid point.

TLDR I'm advocating for "if SKLEARN_TESTS_GLOBAL_RANDOM_SEED is not set", use SKLEARN_TESTS_GLOBAL_RANDOM_SEED=42 by default.

Let's instead have a dedicated explicit marker SKLEARN_TESTS_GLOBAL_RANDOM_SEED="any" that will pick a random seed in the range.

This will not be the default, but I will configure the CI to use it in the scheduled nightly builds.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

That should never happen since we will add the fixture to a test iff the test passes when setting "all". Then, if a PR does not touch the test or the code that this test cover, it wont fail on any seed.

It could happen if a seed-specificity regression is introduced silently in main and there are no sufficient builds to detect it prior to having random contributors' PR detecting the regression for the first time while unrelated to their work.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

Done. This is now silent by default and only display the seed in the pytest report when enabling the global seed randomization.

I tested it locally and it works but it should now be disabled in the PRs.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 11, 2022

dev ❯ SKLEARN_TESTS_GLOBAL_RANDOM_SEED="any" pytest -v -k "test_kmeans_elkan_results and 1e-08-dense-blobs" --pyargs sklearn.cluster.tests
===================================================================== test session starts =====================================================================
platform darwin -- Python 3.9.7, pytest-7.0.1, pluggy-0.13.1 -- /Users/ogrisel/mambaforge/envs/dev/bin/python
cachedir: .pytest_cache
To reproduce this test run, set the following environment variable:
    SKLEARN_TESTS_GLOBAL_RANDOM_SEED="27"
See: https://scikit-learn.org/dev/computing/parallelism.html#environment-variables
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: anyio-3.3.0, xdist-2.3.0, timeout-1.4.2, forked-1.3.0, cov-3.0.0
collected 516 items / 515 deselected / 1 selected                                                                                                             

sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[27-1e-08-dense-blobs] PASSED                                                           [100%]

============================================================== 1 passed, 515 deselected in 0.24s ==============================================================
dev ❯ SKLEARN_TESTS_GLOBAL_RANDOM_SEED="0-9" pytest -v -k "test_kmeans_elkan_results and 1e-08-dense-blobs" --pyargs sklearn.cluster.tests
===================================================================== test session starts =====================================================================
platform darwin -- Python 3.9.7, pytest-7.0.1, pluggy-0.13.1 -- /Users/ogrisel/mambaforge/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: anyio-3.3.0, xdist-2.3.0, timeout-1.4.2, forked-1.3.0, cov-3.0.0
collected 660 items / 650 deselected / 10 selected                                                                                                            

sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[0-1e-08-dense-blobs] PASSED                                                            [ 10%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[1-1e-08-dense-blobs] PASSED                                                            [ 20%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[2-1e-08-dense-blobs] PASSED                                                            [ 30%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[3-1e-08-dense-blobs] PASSED                                                            [ 40%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[4-1e-08-dense-blobs] PASSED                                                            [ 50%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[5-1e-08-dense-blobs] PASSED                                                            [ 60%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[6-1e-08-dense-blobs] PASSED                                                            [ 70%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[7-1e-08-dense-blobs] PASSED                                                            [ 80%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[8-1e-08-dense-blobs] PASSED                                                            [ 90%]
sklearn/cluster/tests/test_k_means.py::test_kmeans_elkan_results[9-1e-08-dense-blobs] PASSED                                                            [100%]

============================================================= 10 passed, 650 deselected in 0.25s ==============================================================

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

doc/computing/parallelism.rst Outdated Show resolved Hide resolved
sklearn/tests/random_seed.py Outdated Show resolved Hide resolved
sklearn/tests/random_seed.py Outdated Show resolved Hide resolved
setup.cfg Show resolved Hide resolved
sklearn/tests/random_seed.py Outdated Show resolved Hide resolved
thomasjpfan and others added 4 commits March 11, 2022 14:07
@ogrisel
Copy link
Member Author

ogrisel commented Mar 13, 2022

We could also define SKLEARN_TESTS_GLOBAL_RANDOM_SEED=0-2, SKLEARN_TESTS_GLOBAL_RANDOM_SEED=3-5, and so on in our different CI entries the regular builds (including PRs). The slowest CI configs would only define a single SKLEARN_TESTS_GLOBAL_RANDOM_SEED=X value that would be different for each slow CI config and we need one for 42 as well make sure that the default seed is also tested.

This might be a good trade-off between computational speed and protection against seed-sensitivity.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 13, 2022

Weird: the import lines of the new fixture are not covered because the pytest coverage plugin kicks in after this plugin has been imported:

https://app.codecov.io/gh/scikit-learn/scikit-learn/compare/22749/diff

:)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I like the idea of setting different random seeds for different CI configurations.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 14, 2022

I pushed that. For now I decided not to use range seeds to keep the CI build times unchanged.

I will merge if green.

@ogrisel ogrisel merged commit d3429ca into scikit-learn:main Mar 14, 2022
@ogrisel ogrisel deleted the random-seed-fixture branch March 14, 2022 09:41
@ogrisel
Copy link
Member Author

ogrisel commented Mar 14, 2022

Merged! Thanks all for the reviews and contributions to improve this fixture.

I will open an issue to give guidelines to start using this fixture in the scikit-learn code base.

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use modifiable global random state in tests
4 participants