Better benchmark stability #69

bclement-ocp · 2023-06-07T15:05:06Z

This is configured with on the command-line with the new --cpus option, which replaces -j and allows to specify a list of cpus or cpu ranges in taskset format, such as 0-3,7-12,15 (strides are not supported).

When the --cpus option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once.

Note that the --cpus setting only concerns the prover runs (and some glue code around one specific prover run): it does not otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the taskset utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in --cpus.

When using the --cpus setting, users should be aware that there may still be other processes on the system using these cpus (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the isolcpus kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use.

To use isolcpus, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the benchpress binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to).

To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a system cpuset that contains only the CPUs that will NOT be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the benchpress binary (in practice I use the root cpuset that contains all the CPUs), and run benchpress in that cpuset. You must not forget to use taskset to prevent the benchpress binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset.

c-cube · 2023-06-07T15:20:39Z

This looks very impressive, I'm not going to pretend I read it all in details but I've been aware of the issue for a while. @Gbury actually had something based on cgroups at one point.

Two points:

the ramdisk seems a bit more flimsy to me (see the comment about windows, but also if provers are very verbose, their output is going to start competing with the available RAM for the provers themselves). There must be an option for that I think.
You should install ocamlformat (0.24.1) and probably configure autoformat, I saw that the code is not formatted :)

bclement-ocp · 2023-06-07T15:32:30Z

Ah yes I disabled autoformat in my editor because of unmentionable reasons but it is formatted now, apologies.

Regarding Windows, I am not sure benchpress is used on Windows atm (and also don't have access to a system to test), but it should just be a little bit of glue to make this work on Windows (well, and an additional argument, because I believe that there is no equivalent to /dev/shm on Windows and a ram disk must be manually mounted). Of course, by "work", I mean mostly do nothing, because there doesn't seem to be an OCaml library to set process affinity on Windows.

I am not sure I understand your point about the output of the provers competing for the available RAM with the provers themselves because I believe that is already an issue with the current implementation where the output is copied to benchpress using CCIO.read_all.

I agree that having the input and proof files in RAM should be an option however; would you rather I add it to this PR or separate the cpu affinity stuff from the ram stuff?

bclement-ocp · 2023-06-07T15:37:25Z

By the way, cpu affinity is a nice bonus, but from my initial testing it seems like it is copying files to RAM prior to running the prover that eliminated the spurious 100x slowdowns.

c-cube · 2023-06-07T15:56:43Z

the output is copied to benchpress using CCIO.read_all.

Ah! Touché 😁. That's partly due to limitations in the Unix API and threads, I'd say. We could also spawn processes with their output directly targeted at a file on disk, although it's an annoying dance to do from Unix.

it seems like it is copying files to RAM prior to running the prover that eliminated the spurious 100x slowdowns.

Ah, so it was the OS filesystem cache all along, I suppose. If you have the bandwidth, I think 2 PRs would be nice indeed.

This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. This helps with sneeuwballen#58

bclement-ocp · 2023-06-07T16:24:05Z

I have pushed just the CPU affinity changes to this PR and will create a different PR for the ramdisk stuff, but GitHub is not working well currently and does not show the new commit in the PR right now :(

master...bclement-ocp:benchpress:bclement/cpuaff

bclement-ocp · 2023-06-07T16:25:57Z

We could also spawn processes with their output directly targeted at a file on disk, although it's an annoying dance to do from Unix.

The "ramdisk" part of the PR (soon to be another PR) does this dance, but the files are in /dev/shm. It would be fairly easy to write on disk instead (in fact this is what will happen on Windows in the current implementation I have).

c-cube · 2023-06-08T14:42:42Z

Now that we depend on processor, maybe we could have -j auto? Use the number of cores as the level of parallelism :)

bclement-ocp · 2023-06-08T15:06:52Z

I think this is more or less what guess_cpu (I forgot what it is called) in Misc does already? But yes it would be cleaner to use processor for that. Is it OK with you if I do it in a separate PR?

c-cube · 2023-06-08T15:30:41Z

Yes definitely !

c-cube · 2023-06-12T19:22:46Z

is this ready? :)

bclement-ocp · 2023-06-13T05:17:18Z

Yes, it should be ready for review, sorry if I wasn't clear. I am waiting on this to be merged to do the processor-for--j auto one.

c-cube · 2023-06-13T14:59:51Z

All good, thank you :)

bclement-ocp force-pushed the bclement/cpuaff branch 2 times, most recently from 19a6a1e to 9cafa5b Compare June 7, 2023 15:19

bclement-ocp force-pushed the bclement/cpuaff branch from 9cafa5b to f977c6a Compare June 7, 2023 15:22

bclement-ocp force-pushed the bclement/cpuaff branch from f977c6a to d541596 Compare June 7, 2023 16:38

bclement-ocp mentioned this pull request Jun 7, 2023

Copy input and output of provers to RAM #71

Draft

c-cube merged commit 80b184f into sneeuwballen:master Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better benchmark stability #69

Better benchmark stability #69

bclement-ocp commented Jun 7, 2023 •

edited

Loading

c-cube commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

c-cube commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

c-cube commented Jun 8, 2023

bclement-ocp commented Jun 8, 2023

c-cube commented Jun 8, 2023 via email

c-cube commented Jun 12, 2023

bclement-ocp commented Jun 13, 2023

c-cube commented Jun 13, 2023

Better benchmark stability #69

Better benchmark stability #69

Conversation

bclement-ocp commented Jun 7, 2023 • edited Loading

c-cube commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

c-cube commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

bclement-ocp commented Jun 7, 2023

c-cube commented Jun 8, 2023

bclement-ocp commented Jun 8, 2023

c-cube commented Jun 8, 2023 via email

c-cube commented Jun 12, 2023

bclement-ocp commented Jun 13, 2023

c-cube commented Jun 13, 2023

bclement-ocp commented Jun 7, 2023 •

edited

Loading