Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark checks to CI #5424

Merged
merged 58 commits into from
Jul 8, 2021
Merged

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Jun 10, 2021

Description

This PR provides a new CI workflow that runs the current commit against the PR target branch. The different strategies are discussed in the conversation below, which conclusions driven by data points collected in my fork.

It also adds a asv check command to the standard CI suite.

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

jaimergp added 7 commits June 10, 2021 12:18
Apparently time_* functions need to accept the params too, even if they don't use it (same signature as setup). Adding *args is enough to fix, no need to copy the full signature.
@jaimergp
Copy link
Contributor Author

Noting here that some of the benchmarks included with asv dev will result in memory allocation errors. It takes 30 minutes to run a quick pass. Might only be worth it for some entries in the job matrix. I'll let the team guide that decision... thoughts?

@grlee77
Copy link
Contributor

grlee77 commented Jun 11, 2021

Thanks, I just merged #5426 and manually started the CI here. We will see if it autostarts on the next commit now...

@jaimergp
Copy link
Contributor Author

That seemed to do the trick, thanks @grlee77!

@grlee77
Copy link
Contributor

grlee77 commented Jun 11, 2021

FYI, if you are seeing warnings about a deprecated multichannel argument, those will be resolved by #5427 for the 0.19 release, see:
2c6e721

Basically, we changed the API to allow specification of the axis containing "channels" for color images rather than using a boolean multichannel argument. For 0.19 both arguments are still present, but the user will get a warning if multichannel is used. To be able to benchmark against older releases, I added this _channel_kwarg helper function

@jaimergp
Copy link
Contributor Author

I'm going to aim for relative performance measurements in GH Actions. I've seen several projects being ok with this approach so at least the catch regressions on PRs. We won't get a history of performance, but that won't be possible without dedicated hardware.

I've merged a similar PR in my own fork that will run the benchmarks every 6h for a week for the same two commits. I will then collect the results (uploaded via artifacts) and compare how reliable they are. We should expect the same relative differences (+- some statistical deviation) in all runs if the approach is stable enough.

I'll ping you when I have some numbers next week!

@grlee77
Copy link
Contributor

grlee77 commented Jun 11, 2021

I've merged a similar PR in my own fork that will run the benchmarks every 6h for a week for the same two commits. I will then collect the results (uploaded via artifacts) and compare how reliable they are. We should expect the same relative differences (+- some statistical deviation) in all runs if the approach is stable enough.

Nice. This will be interesting to see!

@jaimergp
Copy link
Contributor Author

This notebook (attached) will help us assess the stability of the GHA CI for benchmarking purposes. We still need to wait a bit to collect more data points, though, but it looks ok so far!

timeseries.ipynb.zip

@jaimergp
Copy link
Contributor Author

During the weekend we have collected 15 independent attempts. See analysis in this gist.

Reliability is okayish, but it does give some false positives here and there... Let's see how it behaves over the weekdays instead of the weekend.

@jni
Copy link
Member

jni commented Jul 6, 2021

Super amazing work @jaimergp! I agree with @stefanv that some instructions would be great. 😃

By the way, I just removed/added the label and that worked too! So great! 🎉 🚀

@jaimergp
Copy link
Contributor Author

jaimergp commented Jul 6, 2021

How do I look at the artifact via asv? It may be worth adding a README.md into the artifact zip file with instructions.

Got it! I'll add some instructions!

@jaimergp
Copy link
Contributor Author

jaimergp commented Jul 6, 2021

@jni @stefanv -- I added some instructions here. This file will be included in the artifact, together with the CI logs and the JSON databases. Let me know if that's clear enough or if you need something else!

@stefanv
Copy link
Member

stefanv commented Jul 6, 2021

@jni @stefanv -- I added some instructions here. This file will be included in the artifact, together with the CI logs and the JSON databases. Let me know if that's clear enough or if you need something else!

This is... AMAZING?! Really, really nicely done. Thank you!

Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jaimergp, the new docs are both clear and comprehensive. I had just one suggestion to correct a typo.

benchmarks/README_CI.md Outdated Show resolved Hide resolved
Co-authored-by: Gregory R. Lee <grlee77@gmail.com>
@jaimergp
Copy link
Contributor Author

jaimergp commented Jul 7, 2021

Thanks for the kind words @stefanv @grlee77! I think we should run the benchmarks one more time before the merge so we can check the contents of the artifact. Can you please add the label one more time? Thanks!

@grlee77
Copy link
Contributor

grlee77 commented Jul 7, 2021

Can you please add the label one more time? Thanks!

Done

@jaimergp
Copy link
Contributor Author

jaimergp commented Jul 7, 2021

Ok, confirmed! The files are there as expected! 🥳

@grlee77 grlee77 merged commit 2fa66e5 into scikit-image:main Jul 8, 2021
@grlee77
Copy link
Contributor

grlee77 commented Jul 8, 2021

Hi @jaimergp, I am seeing many more Azure CI failures after this PR. There seem to be two separate issues related to this commit:
2fa66e5#diff-7915b9b726a397ae7ba6af7b9703633d21c031ebf21682f3ee7e6a4ec52837a5

Removing the condition on running the gallery examples has resulted in all jobs timing out now, although the timeouts should be resolved if #5446 is merged.

The second issue is that a couple of the cases fail to build wheels for (dependencies of) sphinx-gallery. I am not sure of the best fix for that one: (see here)

The specific error is

  error: failed to run custom build command for `pyo3 v0.13.2`
  
  Caused by:
    process didn't exit successfully: `C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-vc714uet\pywinpty_c3cbae2a690e484a91f2810baf619595\target\release\build\pyo3-93fbd3a6215435b6\build-script-build` (exit code: 1)
    --- stderr
    Error: "Your Rust target architecture (64-bit) does not match your python interpreter (32-bit)"

It is probably easiest to just restore the conditional running of the gallery examples, but let me know what you think.

@jaimergp
Copy link
Contributor Author

jaimergp commented Jul 8, 2021

Oh, wow, that's totally an accident on my end. I shouldn't have removed that conditional. Will open a PR to revert now.

Not sure about the 2nd issue.

@grlee77
Copy link
Contributor

grlee77 commented Jul 8, 2021

Not sure about the 2nd issue.

Yeah, I am not familiar with that package it is trying to build, but it seems to be something pulled in by sphinx-gallery. It might work if we selected a 64-bit Python interpreter, but I don't know. I think those failure cases will be bypassed by restoring the conditional in any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants