-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better benchmark stability #69
Conversation
19a6a1e
to
9cafa5b
Compare
This looks very impressive, I'm not going to pretend I read it all in details but I've been aware of the issue for a while. @Gbury actually had something based on cgroups at one point. Two points:
|
9cafa5b
to
f977c6a
Compare
Ah yes I disabled autoformat in my editor because of unmentionable reasons but it is formatted now, apologies. Regarding Windows, I am not sure benchpress is used on Windows atm (and also don't have access to a system to test), but it should just be a little bit of glue to make this work on Windows (well, and an additional argument, because I believe that there is no equivalent to I am not sure I understand your point about the output of the provers competing for the available RAM with the provers themselves because I believe that is already an issue with the current implementation where the output is copied to benchpress using I agree that having the input and proof files in RAM should be an option however; would you rather I add it to this PR or separate the cpu affinity stuff from the ram stuff? |
By the way, cpu affinity is a nice bonus, but from my initial testing it seems like it is copying files to RAM prior to running the prover that eliminated the spurious 100x slowdowns. |
Ah! Touché 😁. That's partly due to limitations in the Unix API and threads, I'd say. We could also spawn processes with their output directly targeted at a file on disk, although it's an annoying dance to do from
Ah, so it was the OS filesystem cache all along, I suppose. If you have the bandwidth, I think 2 PRs would be nice indeed. |
This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. This helps with sneeuwballen#58
I have pushed just the CPU affinity changes to this PR and will create a different PR for the ramdisk stuff, but GitHub is not working well currently and does not show the new commit in the PR right now :( |
The "ramdisk" part of the PR (soon to be another PR) does this dance, but the files are in |
f977c6a
to
d541596
Compare
Now that we depend on |
I think this is more or less what |
Yes definitely !
|
is this ready? :) |
Yes, it should be ready for review, sorry if I wasn't clear. I am waiting on this to be merged to do the |
All good, thank you :) |
This is configured with on the command-line with the new
--cpus
option, which replaces-j
and allows to specify a list of cpus or cpu ranges in taskset format, such as0-3,7-12,15
(strides are not supported).When the
--cpus
option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once.Note that the
--cpus
setting only concerns the prover runs (and some glue code around one specific prover run): it does not otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as thetaskset
utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in--cpus
.When using the
--cpus
setting, users should be aware that there may still be other processes on the system using these cpus (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: theisolcpus
kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use.To use
isolcpus
, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for thebenchpress
binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to).To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a
system
cpuset that contains only the CPUs that will NOT be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for thebenchpress
binary (in practice I use the root cpuset that contains all the CPUs), and runbenchpress
in that cpuset. You must not forget to usetaskset
to prevent thebenchpress
binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset.