-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random: add benchmarks and drop unnecessary Shuffle function #30396
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
I've also benchmarked Deniel Lemire's nearly divisionless random integer generation (which libstdc++'s For |
d37b7a3
to
e8985f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is determinism of std::shuffle
controlled/controllable? If we don't control it maybe it would be better to keep our custom one to ease reproducibility of test failures, despite it being slower.
I see now that you are passing the proper context. 👍
src/bench/random.cpp
Outdated
void InsecureRandom_rand64(benchmark::Bench& bench) { BenchRandom_rand64(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_rand32(benchmark::Bench& bench) { BenchRandom_rand32(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_randbool(benchmark::Bench& bench) { BenchRandom_randbool(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_randbits(benchmark::Bench& bench) { BenchRandom_randbits(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_randrange100(benchmark::Bench& bench) { BenchRandom_randrange<1000>(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_randrange1000(benchmark::Bench& bench) { BenchRandom_randrange<1000>(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_randrange1000000(benchmark::Bench& bench) { BenchRandom_randrange<1000000>(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_Shuffle100(benchmark::Bench& bench) { BenchRandom_Shuffle<1000>(bench, InsecureRandomContext(251438)); } | ||
void InsecureRandom_stdshuffle100(benchmark::Bench& bench) { BenchRandom_stdshuffle<1000>(bench, InsecureRandomContext(251438)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
251438
seems to be a fine hook for manhole covers. :)
Worth documenting the choice of seed initialization value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the alphabetic position of the letters of "bench" (2-5-14-3-8) concatenated, but it's really just an arbitrary constant.
Also rename the benchmark names to match the operation names
Benchmarks show it is no longer faster with modern standard C++ libraries, and the debug-mode failure due to self-move has been fixed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran benchmarks on da28a26 with..
Clang, non-debug, Linux
$ src/bench/bench_bitcoin --filter=".*Random.*"
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 12.84 | 77,883,564.55 | 0.2% | 103.12 | 47.37 | 2.177 | 15.66 | 4.2% | 0.01 | `FastRandom_Shuffle100`
| 10.35 | 96,651,871.86 | 0.1% | 116.75 | 38.15 | 3.060 | 9.38 | 0.0% | 0.01 | `FastRandom_rand32`
| 19.03 | 52,549,720.82 | 0.2% | 214.50 | 70.12 | 3.059 | 15.75 | 0.0% | 0.01 | `FastRandom_rand64`
| 14.56 | 68,674,993.41 | 0.2% | 149.17 | 53.67 | 2.779 | 14.49 | 1.2% | 0.01 | `FastRandom_randbits`
| 1.69 | 591,854,532.68 | 0.0% | 11.38 | 6.23 | 1.826 | 1.26 | 0.2% | 0.01 | `FastRandom_randbool`
| 12.11 | 82,569,289.00 | 0.1% | 95.33 | 44.68 | 2.133 | 13.70 | 4.4% | 0.01 | `FastRandom_randrange100`
| 13.55 | 73,808,837.05 | 0.6% | 108.46 | 49.99 | 2.170 | 14.63 | 4.4% | 0.01 | `FastRandom_randrange1000`
| 17.80 | 56,164,658.15 | 0.1% | 156.81 | 65.65 | 2.389 | 18.39 | 3.3% | 0.20 | `FastRandom_randrange1000000`
| 13.08 | 76,440,558.82 | 0.2% | 134.02 | 48.23 | 2.779 | 12.42 | 0.7% | 0.01 | `FastRandom_stdshuffle100`
| 7.55 | 132,409,503.40 | 0.9% | 52.83 | 27.85 | 1.897 | 10.39 | 6.3% | 0.01 | `InsecureRandom_Shuffle100`
| 1.70 | 589,139,570.89 | 0.2% | 19.50 | 6.25 | 3.118 | 1.50 | 0.0% | 0.01 | `InsecureRandom_rand32`
| 1.11 | 901,858,137.81 | 0.0% | 7.00 | 4.09 | 1.711 | 0.00 | 0.0% | 0.01 | `InsecureRandom_rand64`
| 2.10 | 475,829,327.28 | 0.4% | 20.89 | 7.75 | 2.697 | 2.48 | 0.0% | 0.01 | `InsecureRandom_randbits`
| 1.40 | 714,972,427.11 | 0.1% | 12.25 | 5.16 | 2.374 | 1.02 | 0.0% | 0.01 | `InsecureRandom_randbool`
| 6.46 | 154,870,239.99 | 0.3% | 39.00 | 23.80 | 1.639 | 8.14 | 7.7% | 0.01 | `InsecureRandom_randrange100`
| 7.08 | 141,237,665.66 | 0.4% | 39.80 | 26.10 | 1.525 | 7.97 | 7.9% | 0.01 | `InsecureRandom_randrange1000`
| 6.49 | 154,065,899.22 | 0.6% | 43.93 | 23.80 | 1.846 | 7.80 | 7.0% | 0.07 | `InsecureRandom_randrange1000000`
| 3.61 | 276,805,356.11 | 0.1% | 34.38 | 13.33 | 2.580 | 3.98 | 0.3% | 0.01 | `InsecureRandom_stdshuffle100`
FastRandom
- std::shuffle
slower by ~2%.
InsecureRandom
- std::shuffle
is ~2.1x faster, the case for migrating is strong here.
GCC equivalent
$ src/bench/bench_bitcoin --filter=".*Random.*"
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 10.90 | 91,706,416.37 | 0.4% | 73.17 | 40.22 | 1.819 | 8.91 | 7.6% | 0.01 | `FastRandom_Shuffle100`
| 10.29 | 97,154,304.49 | 0.1% | 115.31 | 37.96 | 3.037 | 8.94 | 0.0% | 0.01 | `FastRandom_rand32`
| 18.97 | 52,727,508.94 | 0.4% | 210.63 | 69.79 | 3.018 | 15.88 | 0.0% | 0.01 | `FastRandom_rand64`
| 11.56 | 86,507,143.42 | 0.5% | 122.79 | 42.62 | 2.881 | 10.55 | 0.4% | 0.01 | `FastRandom_randbits`
| 1.68 | 594,607,147.44 | 0.1% | 12.32 | 6.20 | 1.987 | 1.25 | 0.2% | 0.01 | `FastRandom_randbool`
| 9.99 | 100,126,193.36 | 0.4% | 61.84 | 36.83 | 1.679 | 6.91 | 9.1% | 0.01 | `FastRandom_randrange100`
| 11.50 | 86,963,003.45 | 0.3% | 75.50 | 42.40 | 1.781 | 7.89 | 8.1% | 0.01 | `FastRandom_randrange1000`
| 15.36 | 65,104,420.98 | 0.4% | 122.93 | 56.09 | 2.192 | 11.41 | 5.2% | 0.17 | `FastRandom_randrange1000000`
| 14.70 | 68,050,253.74 | 0.0% | 138.64 | 54.21 | 2.558 | 11.02 | 0.1% | 0.01 | `FastRandom_stdshuffle100`
| 8.32 | 120,167,896.55 | 0.4% | 51.58 | 30.66 | 1.682 | 7.84 | 8.3% | 0.01 | `InsecureRandom_Shuffle100`
| 1.09 | 920,716,470.48 | 0.0% | 13.00 | 4.01 | 3.244 | 1.00 | 0.0% | 0.01 | `InsecureRandom_rand32`
| 0.88 | 1,133,795,846.45 | 0.1% | 7.00 | 3.25 | 2.153 | 0.00 | 57.0% | 0.01 | `InsecureRandom_rand64`
| 1.66 | 601,775,499.66 | 1.2% | 19.86 | 6.11 | 3.251 | 1.98 | 0.0% | 0.01 | `InsecureRandom_randbits`
| 0.56 | 1,771,130,575.63 | 0.1% | 4.28 | 2.08 | 2.057 | 1.00 | 0.0% | 0.01 | `InsecureRandom_randbool`
| 6.97 | 143,428,585.12 | 0.2% | 30.93 | 25.70 | 1.203 | 5.84 | 10.5% | 0.01 | `InsecureRandom_randrange100`
| 7.25 | 137,895,833.08 | 0.2% | 32.18 | 26.75 | 1.203 | 5.78 | 10.6% | 0.01 | `InsecureRandom_randrange1000`
| 7.37 | 135,732,504.89 | 0.4% | 37.62 | 26.87 | 1.400 | 5.81 | 9.2% | 0.08 | `InsecureRandom_randrange1000000`
| 5.81 | 172,220,492.06 | 0.1% | 33.82 | 21.42 | 1.579 | 3.09 | 0.3% | 0.01 | `InsecureRandom_stdshuffle100`
FastRandom
- std::shuffle
shows a performance loss of 26%. :/
InsecureRandom
- std::shuffle
wins with ~1.4x speed multiplier.
(Side note: std::shuffle
seems much better than Shuffle
when it comes to branch misses in this configuration (already significantly better in Clang)).
GCC in debug mode
$ src/bench/bench_bitcoin --filter=".*Random.*"
Warning, results might be unstable:
* DEBUG defined
Recommendations
* Make sure you compile for Release
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 96.61 | 10,350,962.32 | 0.4% | 844.51 | 356.24 | 2.371 | 124.47 | 0.5% | 0.01 | `FastRandom_Shuffle100`
| 182.54 | 5,478,307.47 | 0.0% | 1,703.98 | 673.49 | 2.530 | 204.73 | 0.1% | 0.01 | `FastRandom_rand32`
| 342.44 | 2,920,180.80 | 0.0% | 3,197.03 | 1,263.38 | 2.531 | 370.47 | 0.1% | 0.01 | `FastRandom_rand64`
| 192.05 | 5,207,050.93 | 0.1% | 1,756.26 | 708.80 | 2.478 | 213.54 | 0.3% | 0.01 | `FastRandom_randbits`
| 16.01 | 62,471,637.02 | 0.1% | 148.23 | 59.00 | 2.513 | 23.84 | 0.1% | 0.01 | `FastRandom_randbool`
| 87.24 | 11,461,986.84 | 0.6% | 758.66 | 321.66 | 2.359 | 108.08 | 0.6% | 0.01 | `FastRandom_randrange100`
| 110.16 | 9,077,576.97 | 0.5% | 965.24 | 405.83 | 2.378 | 132.06 | 0.5% | 0.01 | `FastRandom_randrange1000`
| 186.39 | 5,365,223.88 | 0.1% | 1,676.54 | 687.82 | 2.437 | 215.45 | 0.3% | 2.05 | `FastRandom_randrange1000000`
| 215.54 | 4,639,483.16 | 0.2% | 1,906.59 | 794.16 | 2.401 | 229.95 | 0.1% | 0.01 | `FastRandom_stdshuffle100`
| 53.91 | 18,549,333.04 | 0.3% | 457.13 | 198.59 | 2.302 | 79.37 | 0.8% | 0.01 | `InsecureRandom_Shuffle100`
| 22.03 | 45,393,295.79 | 0.1% | 215.50 | 81.27 | 2.652 | 31.50 | 0.0% | 0.01 | `InsecureRandom_rand32`
| 22.72 | 44,019,773.42 | 0.0% | 220.00 | 83.84 | 2.624 | 24.00 | 0.0% | 0.01 | `InsecureRandom_rand64`
| 27.54 | 36,307,742.75 | 0.6% | 244.77 | 101.11 | 2.421 | 37.63 | 1.0% | 0.01 | `InsecureRandom_randbits`
| 10.95 | 91,358,394.49 | 0.1% | 101.70 | 40.40 | 2.518 | 18.42 | 0.1% | 0.01 | `InsecureRandom_randbool`
| 44.06 | 22,693,987.66 | 0.3% | 371.40 | 162.65 | 2.283 | 63.01 | 0.9% | 0.01 | `InsecureRandom_randrange100`
| 45.33 | 22,061,595.98 | 0.2% | 382.82 | 167.30 | 2.288 | 64.26 | 0.9% | 0.01 | `InsecureRandom_randrange1000`
| 50.17 | 19,933,844.35 | 0.2% | 438.63 | 184.96 | 2.372 | 71.38 | 0.8% | 0.55 | `InsecureRandom_randrange1000000`
| 52.96 | 18,880,998.91 | 0.2% | 418.53 | 195.25 | 2.144 | 56.76 | 0.0% | 0.01 | `InsecureRandom_stdshuffle100`
FastRandom
- std::shuffle
is less than half the speed. :/ A lot more instructions & branches.
InsecureRandom
- Negligible difference.
Conclusion so far
InsecureRandom
is only used in tests, so this would probably be a win for developers/testers/fuzzing but not really for end users. Maybe other systems/configurations show a clearer win.
Also ran make check
& test/functional/test_runner.py
on 6ecda04 in GCC debug without any failures. Confirming that the self-assign issue with std::shuffle
doesn't appear to be an issue. (Haven't tried to create a configuration that triggers the panic though).
@hodlinator FWIW, my motivation here is just FastRandomContext non-debug performance. I did notice that I do expect InsecureRandomContext will get used for non-testing purposes eventually, but that's not a huge motivation now. To explain what we're seeing here, worked out for a shuffle of 100 elements:
In FastRandomContext, the cost of a If we really wanted, it would be possible to use the technique that For posterity, this is sufficient to make 32-bit +
+ template<std::integral I>
+ I randrange(I range) noexcept
+ {
+ if constexpr (std::numeric_limits<I>::digits <= 32) {
+ uint32_t x = rand32();
+ uint64_t m = uint64_t{x} * uint32_t(range);
+ uint32_t l = uint32_t(m);
+ if (l < uint32_t(range)) {
+ uint32_t t = (0 - range) % range;
+ while (l < t) {
+ x = rand32();
+ m = uint64_t{x} * uint32_t(range);
+ l = uint32_t(m);
+ }
+ }
+ return m >> 32;
+ }
+ return RandomMixin<InsecureRandomContext>::randrange(range);
+ } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 6ecda04
Thanks for taking the time to add valuable context!!
(I read up a bit, understand you implemented a 32-bit version of Daniel Lemire's nearly divisionless algo. Nice).
I did notice that
std::shuffle
is a bit slower than the custom Shuffle on my system too (AMD 5950X, GCC 13.2)
Maybe worth adding some nuance to the PR summary which currently says "now that it appears that standard library std::shuffle
beats it.".
# rand64()
invocations
Shuffle
would userandrange
100 times, which for the ranges involved would invokerand64
on average ~13.005 times
Agreed, verified with added instrumentation (rough reasoning through: range of 100 > 0b1100100 > 7 binary digits wide. 64 bits / 7 bits => ~9 randbits()
calls per rand64()
... with some extra randbits()
because of if (ret <= maxval)
in randrange()
failing => 100 / 9 + extra = ~13).
std::shuffle
would invokerand64
~100 times directly.
nit: With the added instrumentation my recent testing shows it to be =50 times - as the ranges are small enough std::shuffle()
will perform 2 swaps for each rand64()
. This goes for both Clang & GCC.
As there is no use case for efficient
InsecureRandomContext::randrange
right now that doesn't seem worth it, but that can change.
Seems like it could make fuzz-tests more efficient even now and hence free up resources for more test coverage. But agree it's probably minor.
(Bigger ranges converge)
I'm also on GCC 13.2. CPU: Intel 9th gen Coffee Lake. Did some test with bigger ranges and performance seems to converge between Shuffle
& std::shuffle
for FastRandom
.
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 12.01 | 83,247,932.44 | 0.4% | 86.95 | 44.32 | 1.962 | 9.88 | 6.8% | 0.01 | `FastRandom_Shuffle1000`
| 14.61 | 68,427,974.21 | 0.1% | 138.35 | 53.91 | 2.566 | 10.95 | 0.0% | 0.01 | `FastRandom_stdshuffle1000`
| 9.65 | 103,602,289.99 | 0.3% | 52.55 | 35.61 | 1.475 | 7.78 | 7.9% | 0.01 | `InsecureRandom_Shuffle1000`
| 5.67 | 176,274,334.46 | 0.0% | 33.53 | 20.92 | 1.603 | 3.01 | 0.0% | 0.01 | `InsecureRandom_stdshuffle1000`
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 13.99 | 71,471,645.86 | 0.4% | 107.03 | 51.56 | 2.076 | 11.44 | 6.2% | 0.01 | `FastRandom_Shuffle10000`
| 14.68 | 68,116,488.28 | 0.1% | 138.32 | 54.15 | 2.554 | 10.94 | 0.0% | 0.01 | `FastRandom_stdshuffle10000`
| 8.60 | 116,299,489.79 | 0.1% | 55.08 | 31.70 | 1.738 | 7.92 | 8.3% | 0.01 | `InsecureRandom_Shuffle10000`
| 5.66 | 176,764,575.63 | 0.1% | 33.50 | 20.87 | 1.606 | 3.00 | 0.0% | 0.01 | `InsecureRandom_stdshuffle10000`
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 15.24 | 65,617,271.52 | 0.3% | 123.32 | 56.05 | 2.200 | 12.64 | 5.0% | 0.02 | `FastRandom_Shuffle100000`
| 15.44 | 64,762,393.26 | 0.1% | 138.31 | 56.89 | 2.431 | 10.94 | 0.0% | 0.02 | `FastRandom_stdshuffle100000`
| 8.41 | 118,884,436.01 | 0.3% | 56.50 | 30.95 | 1.826 | 7.93 | 7.4% | 0.01 | `InsecureRandom_Shuffle100000`
| 5.79 | 172,767,668.95 | 0.0% | 33.50 | 21.35 | 1.569 | 3.00 | 0.0% | 0.01 | `InsecureRandom_stdshuffle100000`
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 16.44 | 60,831,272.72 | 0.4% | 134.57 | 60.60 | 2.220 | 13.41 | 4.4% | 0.18 | `FastRandom_Shuffle1000000`
| 16.49 | 60,636,239.08 | 0.0% | 138.31 | 60.82 | 2.274 | 10.94 | 0.0% | 0.18 | `FastRandom_stdshuffle1000000`
| 8.41 | 118,931,462.65 | 0.2% | 56.74 | 31.00 | 1.830 | 7.81 | 6.9% | 0.09 | `InsecureRandom_Shuffle1000000`
| 6.23 | 160,482,706.30 | 0.6% | 33.50 | 22.98 | 1.458 | 3.00 | 0.0% | 0.07 | `InsecureRandom_stdshuffle1000000`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 6ecda04
ACK 6ecda04 |
For fuzzing, just to keep it in mind, there is (obviously) no guarantee by the standard as to how
So going forward, I guess one can only assume to surely reproduce a fuzz crash by using the exact same stdlib (and compiler). For context there is also a compiler behavior mismatch discussed in #29043. |
post merge ACK. My local bench results had comparable performance as well |
This is a strong counter-point. I guess other stdlib-differences like |
I don't think there are any It may make sense to add something like the old |
I hope all code touched in this pull request is covered by fuzz tests. If not, then that should be fixed ;) For example, I don't have a strong feeling either way, because if an effort is made to have uniform fuzz behavior across compilers and stdlibs, it will probably be too hard. |
Ported to the CMake-based build system in hebasto#264. |
Hmm, for some reason I only considered calls to |
This adds benchmarks for various operations on
FastRandomContext
andInsecureRandomContext
, and then removes the ad-hocShuffle
functions, now that it appears that standard librarystd::shuffle
has comparable performance. The other reason for keepingShuffle
, namely the fact that libstdc++ used self-move (which debug mode panics on) has been fixed as well (see #29625 (comment)).