A new benchmarking platform #16

yarikoptic · 2024-01-27T14:33:34Z

We do use asv too but recently discovered

An interesting new benchmarking CI/portal: https://codspeed.io/ which seems to be really easy to use - https://docs.codspeed.io/benchmarks/python - just pytest mark some unittests to be your benchmarks. it is all based on https://pypi.org/project/pytest-benchmark/ so - quite available locally.

So might be worth considering IMHO and do research first

The text was updated successfully, but these errors were encountered:

oruebel · 2024-01-27T15:13:04Z

Thanks @yarikoptic we'll take a look! Currently we are targeting mainly benchmarks for cloud access and some of the tests may have fairly long runtimes (at least compared to standard unit test). At first glance, I think codspeed.io may be more appropriate for benchmarks based on our existing unit tests in PyNWB. I think that would be useful too, but that would target a different use case (i.e., local performance and smaller/faster tests)

yarikoptic · 2024-01-29T21:17:20Z

For "cloud access", potentially very relevant is dandi/dandisets-healthstatus#66 where @jwodder is about to review a number of FUSE solutions on top of our https://github.com/dandi/dandi-webdav/ and also compare to datalad-fuse solution we use so far for dandisets-healthstatus.

oruebel · 2024-01-29T22:44:46Z

Thanks! Currently we are planning to test data streaming with HDF5:

ros3 via h5py
fsspec
remfile
kerchunk

We are also planning to compare with Zarr.

@jwodder @yarikoptic if there are other options you think we should include in our benchmark, then please just ping us. Generally, we are looking at options for streaming that allow access to subsets of an NWB file in HDF5 or Zarr without having to download the whole file.

yarikoptic · 2024-01-29T23:19:56Z

ideally, if up to it, would be great also to see benchmarking of setups with local caching where on first run the solution (like fsspec) establishes local cache, and then on subsequent rerun uses local content (likely first checking ETag on remote URL to remain the same)... but might be something to really keep for future investigations ;-)

oruebel · 2024-04-24T21:12:39Z

ideally, if up to it, would be great also to see benchmarking of setups with local caching

We are currently adding fsspec + disk cache and remfile + disk cache test. The current tests clean up the cache between runs, so we only see the impact of the cache within the context of the specific operation. It may be interesting to also add tests where we run the test case first in the setup to prime the cache and then evaluate how long it takes to repeat the same operation after the cache has been set up. I'll make a separate issue for this so we can discuss there.

oruebel · 2024-04-24T21:55:34Z

I created #46 for the tests with caching. I'll close this issue for now, but please reopen if I missed anything or you have additional comments.

oruebel closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new benchmarking platform #16

A new benchmarking platform #16

yarikoptic commented Jan 27, 2024

oruebel commented Jan 27, 2024

yarikoptic commented Jan 29, 2024

oruebel commented Jan 29, 2024

yarikoptic commented Jan 29, 2024

oruebel commented Apr 24, 2024

oruebel commented Apr 24, 2024

A new benchmarking platform #16

A new benchmarking platform #16

Comments

yarikoptic commented Jan 27, 2024

oruebel commented Jan 27, 2024

yarikoptic commented Jan 29, 2024

oruebel commented Jan 29, 2024

yarikoptic commented Jan 29, 2024

oruebel commented Apr 24, 2024

oruebel commented Apr 24, 2024