-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto: added runtime checks for SHA hardware #3188
base: master
Are you sure you want to change the base?
Conversation
65b454f
to
e6a1d8c
Compare
e6a1d8c
to
f3ddf35
Compare
@edtubbs thanks for this! I'm going to test this on multiple platforms. I saw that the runtime selectors are executed regardless of experimental features - I think this could be ok, but I haven't thought about it a lot. I was wondering if you had an explicit rationale for that? |
Sure thing! That's a good catch. I think it would be consistent with our policy on experimental features to include runtime selectors. I'll make that change. |
96471cb
to
b5eb175
Compare
b5eb175
to
8931270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I inserted this inside the header selectors of sha1.cpp
and sha256.cpp
both before line 26, it compiled for me on an M1 with Apple clang:
# if defined(__APPLE__) && defined(__apple_build_version__)
# include <sys/sysctl.h>
# endif
With that addition, compile succeeds and bench performs similar to what I reported here
Meantime, I'll move on to test on Graviton2/Linux ❤️
NOTE: if the headers are inserted into the |
8931270
to
9f0f37d
Compare
Awesome! I've rebased with those changes. |
Tested on Graviton2 and Graviton3 boxes. Compiled everything on a Graviton3 with sha512 extensions and then copied that binary to a Graviton2 that doesn't have these, to make sure that the runtime check works.
Things that still need to be tested:
|
Cool! Agree on path forward to address the warnings. I'll assist with testing on the remaining platforms and maybe get some others to help! |
Tested if macOs x86_64 still compiles when doing |
9f0f37d
to
d6950a3
Compare
Confirmed that d6950a3 works on both non-avx2 and avx2 capable hardware under linux64. details can be found here. That's another ✅ I also confirmed that AMD Zen-1 processors have real issues and perform worse with avx2 enabled than without, but that does not have to be solved in this PR - I do think that this is important to consider when we want to mature this though. (also re-confirmed this not breaking macOS build) |
I've added a "help wanted" because I personally do not have access to:
If anyone has either of these and wants to help validating this PR, please let us know here and we'll get you the needed binaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested both armv8
and armv82
on rpi4. Note that armv82
did not work at all for any binaries tested. I also tested on win64
and that worked just fine.
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
Hardware : BCM2835
Revision : c03114
Serial : 10000000bca25bff
Model : Raspberry Pi 4 Model B Rev 1.4
armv82
error:
$ ./bench_dogecoin
Illegal instruction (core dumped)
$ ./test_dogecoin
Illegal instruction (core dumped)
gdb
output for armv82
:
0x0000aaaaaab67850 in boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_alloc_>() ()
(gdb) up
#1 0x0000aaaaaab1e488 in __static_initialization_and_destruction_0(int, int) [clone .constprop.0] ()
(gdb) up
#2 0x0000aaaaab3278c8 in __libc_csu_init ()
(gdb) up
#3 0x0000fffff7d6a2e8 in __libc_start_main_impl (main=0xaaaaaab3dfb0 <main>, argc=1, argv=0xfffffffff4a8,
init=0xaaaaab327868 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>)
at ../csu/libc-start.c:375
375 ../csu/libc-start.c: No such file or directory.
(gdb) up
#4 0x0000aaaaaab42890 in _start ()
(gdb) up
Initial frame selected; you cannot go up.
armv8
output:
$ ./test_dogecoin
Running 253 test cases...
*** No errors detected
$ ./bench_dogecoin
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,147456,0.000007197275409,0.000007205817383,0.000007200053207,0,0,0
Base58Decode,245760,0.000004089597496,0.000004133544280,0.000004103922402,0,0,0
Base58Encode,229376,0.000004668880138,0.000004674555385,0.000004672222920,0,0,0
CCheckQueueSpeed,1280,0.000806203112006,0.000912621617317,0.000848289020360,0,0,0
CCheckQueueSpeedPrevectorJob,352,0.003042533993721,0.003322824835777,0.003103031353517,0,0,0
CCoinsCaching,61440,0.000016914564185,0.000018388673197,0.000017208722420,0,0,0
CoinSelection,160,0.006696686148643,0.007606506347656,0.006783099472523,0,0,0
LockedPool,144,0.004227995872498,0.007337123155594,0.006997707817290,0,0,0
MempoolEviction,12288,0.000087335938588,0.000093324342743,0.000088167396219,0,0,0
RIPEMD160,144,0.007248044013977,0.007566064596176,0.007317478458087,0,0,0
RollingBloom-refresh,1,0.001132000000000,0.001132000000000,0.001132000000000
RollingBloom-refresh,1,0.000719000000000,0.000719000000000,0.000719000000000
RollingBloom-refresh,1,0.000727000000000,0.000727000000000,0.000727000000000
RollingBloom-refresh,1,0.000728000000000,0.000728000000000,0.000728000000000
RollingBloom-refresh,1,0.000710000000000,0.000710000000000,0.000710000000000
RollingBloom-refresh,1,0.000740000000000,0.000740000000000,0.000740000000000
RollingBloom-refresh,1,0.000754000000000,0.000754000000000,0.000754000000000
RollingBloom,425984,0.000002242974006,0.000002634311386,0.000002363602667,0,0,0
SHA1,192,0.005361497402191,0.005662187933922,0.005399146427711,0,0,0
SHA256,128,0.008059620857239,0.008092522621155,0.008076656609774,0,0,0
SHA256_32b,2,0.612305998802185,0.612305998802185,0.612305998802185,0,0,0
SHA512,192,0.005607247352600,0.005650401115417,0.005613212784131,0,0,0
Scrypt,1920,0.000543218106031,0.000545129179955,0.000543585419655,0,0,0
SipHash_32b,16,0.068132996559143,0.068148016929626,0.068143382668495,0,0,0
Sleep100ms,10,0.100099921226501,0.100114583969116,0.100108289718628,0,0,0
Trig,14680064,0.000000050897597,0.000000073714773,0.000000070110803,0,0,0
win64
specs:
PS C:\Users\dakod> Get-CimInstance -Query 'Select * from Win32_Processor'
DeviceID Name Caption MaxClockSpeed SocketDesignatio
n
-------- ---- ------- ------------- ----------------
CPU0 Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz Intel64 Family 6 Model 165 Stepping 2 2496 U3E1
win64
output:
PS C:\Users\dakod\source\repos> .\test_dogecoin.exe
Running 253 test cases...
*** No errors detected
PS C:\Users\dakod\source\repos> .\bench_dogecoin.exe
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,393216,0.000002500426490,0.000002689572284,0.000002603086007,6252,6828,6502
Base58Decode,720896,0.000001449003321,0.000001586915459,0.000001478715355,3613,3927,3692
Base58Encode,655360,0.000001586537110,0.000001648058969,0.000001619046088,4016,4111,4044
CCheckQueueSpeed,8192,0.000111327972263,0.000140612944961,0.000124877929920,280162,349229,311922
CCheckQueueSpeedPrevectorJob,768,0.001296859234571,0.001406222581863,0.001324917810659,3225043,3477352,3307092
CCoinsCaching,196608,0.000005371082807,0.000005529116606,0.000005466018289,13547,13774,13651
CoinSelection,832,0.001203157007694,0.001438535749912,0.001294508289832,3021586,3559221,3229346
LockedPool,640,0.001624882221222,0.001671858131886,0.001647884398699,4042072,4184125,4115312
MempoolEviction,40960,0.000025138200726,0.000026401365176,0.000025415088749,62712,65372,63391
RIPEMD160,512,0.001937404274940,0.001986905932426,0.001962081994861,4855528,4937830,4897861
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000999000000000,0.000999000000000,0.000999000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.001001000000000,0.001001000000000,0.001001000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.001000000000000,0.001000000000000,0.001000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000998000000000,0.000998000000000,0.000998000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom,1703936,0.000000579315383,0.000000640884537,0.000000624447729,1465,1594,1559
SHA1,768,0.001281231641769,0.001547407358885,0.001317944067220,3218937,3829579,3288900
SHA256,416,0.002531222999096,0.002625249326229,0.002567310172778,6328474,6552884,6405013
SHA256_32b,6,0.184270501136780,0.186500072479248,0.185095349947611,460394078,464883856,462089531
SHA512,640,0.001625001430511,0.001688063144684,0.001670373231173,4113407,4228193,4171196
Scrypt,6144,0.000171858817339,0.000179719179869,0.000174083320114,432250,437575,434219
SipHash_32b,48,0.020484566688538,0.021250247955322,0.020875791708628,51642745,52907877,52108832
Sleep100ms,10,0.108206391334534,0.110309600830078,0.108905982971191,269698268,275213840,271766290
Trig,27262976,0.000000036223923,0.000000039983661,0.000000037047821,90,98,92
I suspect that this is caused by other ARMv82 features. We can likely prevent this error by:
The same can be true for ARMv8... perhaps we can check this theory against a Cortex M7 or older with the ARMv8 binary? |
3bb5d1f
to
6e45670
Compare
|
I'd appreciate seeing the results! |
added checks for Apple Clang compilation added check for sha512 instruction
6e45670
to
63c3266
Compare
Thanks for rebase ❤️ I had an item on my todo to ask for that. I'll rebuild test binaries shortly. |
I'm magically getting this error again on M1 Ventura with
|
That's interesting. Ok, I'll take another look at the Makefile targets too. |
I was thinking maybe I'll just implement the generic capability checker that I had selected for after this PR now, integrate it with SSE2 as a template and then we can reuse it here. Shouldn't take me more than this weekend to get that in a PR form. |
Ok, cool. |
@edtubbs I had to do some However, I then found a potentially cleaner concept, though a bit more verbose, in bitcoin/bitcoin@c1e16cb, which, if done indiscriminately, seems to solve the issue we ran into at the very beginning on M1, by just clearly specifying each module's parametrization and compiling them into a static I still have to do cleanup and test it on some more architectures than just linux x86_64 and macOS arm64, but this actually looks promising. |
That's a good approach, I can go ahead and update crypto base similarly here. |
Yes, that works. May even be able to just do this one first then and do the generic selectors as a second step. Probably still faster to merge this one first. The best results for me thus far have been:
|
updated to build a shared library archive
63c3266
to
a8049b6
Compare
Built and tested on ARMv8.2 with linux. |
Confirmed that this fixed the link error on arm64 MacOS. Will now re-build test binaries for all platforms. |
Do you have the results of this? Thanks. |
This replaces an earlier PR (#3091) that was later reverted (#3135) due to #3134. The updated pull request includes the necessary checks for Apple M devices, but testing on actual hardware is needed to confirm its functionality.