Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto: added runtime checks for SHA hardware #3188

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

edtubbs
Copy link
Contributor

@edtubbs edtubbs commented Dec 12, 2022

This replaces an earlier PR (#3091) that was later reverted (#3135) due to #3134. The updated pull request includes the necessary checks for Apple M devices, but testing on actual hardware is needed to confirm its functionality.

@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch 2 times, most recently from 65b454f to e6a1d8c Compare December 12, 2022 05:08
@patricklodder patricklodder requested a review from a team December 12, 2022 11:50
@patricklodder patricklodder added this to the 1.14.7 milestone Dec 12, 2022
src/crypto/sha1.cpp Outdated Show resolved Hide resolved
@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch from e6a1d8c to f3ddf35 Compare December 13, 2022 05:18
@patricklodder
Copy link
Member

@edtubbs thanks for this! I'm going to test this on multiple platforms.

I saw that the runtime selectors are executed regardless of experimental features - I think this could be ok, but I haven't thought about it a lot. I was wondering if you had an explicit rationale for that?

@edtubbs
Copy link
Contributor Author

edtubbs commented Dec 13, 2022

@edtubbs thanks for this! I'm going to test this on multiple platforms.

I saw that the runtime selectors are executed regardless of experimental features - I think this could be ok, but I haven't thought about it a lot. I was wondering if you had an explicit rationale for that?

Sure thing! That's a good catch. I think it would be consistent with our policy on experimental features to include runtime selectors. I'll make that change.

@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch 2 times, most recently from 96471cb to b5eb175 Compare December 14, 2022 05:29
@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch from b5eb175 to 8931270 Compare December 22, 2022 18:59
Copy link
Member

@patricklodder patricklodder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I inserted this inside the header selectors of sha1.cpp and sha256.cpp both before line 26, it compiled for me on an M1 with Apple clang:

# if defined(__APPLE__) && defined(__apple_build_version__)
#  include <sys/sysctl.h>
# endif

With that addition, compile succeeds and bench performs similar to what I reported here

Meantime, I'll move on to test on Graviton2/Linux ❤️

@patricklodder
Copy link
Member

When I inserted this inside the header selectors of sha1.cpp and sha256.cpp both before line 26, it compiled for me on an M1 with Apple clang [..]

NOTE: if the headers are inserted into the sha512.cpp file too, then, with a recent SDK/clang, ARMv82 crypto works on M1 too. Results here

@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch from 8931270 to 9f0f37d Compare January 6, 2023 04:35
@edtubbs
Copy link
Contributor Author

edtubbs commented Jan 6, 2023

When I inserted this inside the header selectors of sha1.cpp and sha256.cpp both before line 26, it compiled for me on an M1 with Apple clang [..]

NOTE: if the headers are inserted into the sha512.cpp file too, then, with a recent SDK/clang, ARMv82 crypto works on M1 too. Results here

Awesome! I've rebased with those changes.

@patricklodder
Copy link
Member

patricklodder commented Jan 24, 2023

Tested on Graviton2 and Graviton3 boxes.

Compiled everything on a Graviton3 with sha512 extensions and then copied that binary to a Graviton2 that doesn't have these, to make sure that the runtime check works.

  • Compile is successful, but with 3 warnings. I would be okay with not fixing these for now, because I have an idea how to make the runtime capability detection more uniform to make it easier to manage, as right now it resides in 4 different files (sha1/256/512 and scrypt). I can open a PR for that after this one.
crypto/sha1.cpp:251:6: warning: ‘void {anonymous}::sha1::Transform_AVX2(uint32_t*, const unsigned char*)’ defined but not used [-Wunused-function]
  251 | void Transform_AVX2(uint32_t* s, const unsigned char* chunk)
      |      ^~~~~~~~~~~~~~
  CXX      crypto/libdogecoin_crypto_a-sha256.o
crypto/sha256.cpp:257:6: warning: ‘void {anonymous}::sha256::Transform_AVX2(uint32_t*, const unsigned char*)’ defined but not used [-Wunused-function]
  257 | void Transform_AVX2(uint32_t* s, const unsigned char* chunk)
      |      ^~~~~~~~~~~~~~
  CXX      crypto/libdogecoin_crypto_a-sha512.o
crypto/sha512.cpp:315:6: warning: ‘void {anonymous}::sha512::Transform_AVX2(uint64_t*, const unsigned char*)’ defined but not used [-Wunused-function]
  315 | void Transform_AVX2(uint64_t* s, const unsigned char* chunk)
      |      ^~~~~~~~~~~~~~
  • tests pass on both hosts ✅
  • bench passes
  • testnet sync passes ✅

Things that still need to be tested:

  • ARM without any v8 crypto extensions (i.e. RPi4)
  • macOS x86_64 with experimental SSE2 scrypt but no Intel AVX2 built-in (because not supported)
  • Windows x86_64 with Intel AVX2 built-in with no cpu capabilities
  • Linux x86_64 with Intel AVX2 built-in with no cpu capabilities

@edtubbs
Copy link
Contributor Author

edtubbs commented Jan 25, 2023

Cool! Agree on path forward to address the warnings. I'll assist with testing on the remaining platforms and maybe get some others to help!

@patricklodder
Copy link
Member

Tested if macOs x86_64 still compiles when doing --enable-experimental and --enable-scrypt-sse2 and this works, so that's another ✅

src/crypto/sha1.cpp Outdated Show resolved Hide resolved
@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch from 9f0f37d to d6950a3 Compare February 3, 2023 06:10
@patricklodder
Copy link
Member

Confirmed that d6950a3 works on both non-avx2 and avx2 capable hardware under linux64. details can be found here. That's another ✅

I also confirmed that AMD Zen-1 processors have real issues and perform worse with avx2 enabled than without, but that does not have to be solved in this PR - I do think that this is important to consider when we want to mature this though.

(also re-confirmed this not breaking macOS build)

@patricklodder
Copy link
Member

I've added a "help wanted" because I personally do not have access to:

  1. a 64-bit Windows install on a device with a processor that doesn't support avx2 extensions
  2. a aarch64 device without crypto extensions like the raspberry pi 4.

If anyone has either of these and wants to help validating this PR, please let us know here and we'll get you the needed binaries.

Copy link
Member

@xanimo xanimo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested both armv8 and armv82 on rpi4. Note that armv82 did not work at all for any binaries tested. I also tested on win64 and that worked just fine.

$ cat /proc/cpuinfo
processor    : 0
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor    : 1
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor    : 2
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor    : 3
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 3

Hardware    : BCM2835
Revision    : c03114
Serial        : 10000000bca25bff
Model        : Raspberry Pi 4 Model B Rev 1.4

armv82 error:

$ ./bench_dogecoin
Illegal instruction (core dumped)
$ ./test_dogecoin
Illegal instruction (core dumped)

gdb output for armv82:

0x0000aaaaaab67850 in boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_alloc_>() ()
(gdb) up
#1  0x0000aaaaaab1e488 in __static_initialization_and_destruction_0(int, int) [clone .constprop.0] ()
(gdb) up
#2  0x0000aaaaab3278c8 in __libc_csu_init ()
(gdb) up
#3  0x0000fffff7d6a2e8 in __libc_start_main_impl (main=0xaaaaaab3dfb0 <main>, argc=1, argv=0xfffffffff4a8, 
    init=0xaaaaab327868 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>)
    at ../csu/libc-start.c:375
375    ../csu/libc-start.c: No such file or directory.
(gdb) up
#4  0x0000aaaaaab42890 in _start ()
(gdb) up
Initial frame selected; you cannot go up.

armv8 output:

$ ./test_dogecoin 
Running 253 test cases...

*** No errors detected
$ ./bench_dogecoin 
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,147456,0.000007197275409,0.000007205817383,0.000007200053207,0,0,0
Base58Decode,245760,0.000004089597496,0.000004133544280,0.000004103922402,0,0,0
Base58Encode,229376,0.000004668880138,0.000004674555385,0.000004672222920,0,0,0
CCheckQueueSpeed,1280,0.000806203112006,0.000912621617317,0.000848289020360,0,0,0
CCheckQueueSpeedPrevectorJob,352,0.003042533993721,0.003322824835777,0.003103031353517,0,0,0
CCoinsCaching,61440,0.000016914564185,0.000018388673197,0.000017208722420,0,0,0
CoinSelection,160,0.006696686148643,0.007606506347656,0.006783099472523,0,0,0
LockedPool,144,0.004227995872498,0.007337123155594,0.006997707817290,0,0,0
MempoolEviction,12288,0.000087335938588,0.000093324342743,0.000088167396219,0,0,0
RIPEMD160,144,0.007248044013977,0.007566064596176,0.007317478458087,0,0,0
RollingBloom-refresh,1,0.001132000000000,0.001132000000000,0.001132000000000
RollingBloom-refresh,1,0.000719000000000,0.000719000000000,0.000719000000000
RollingBloom-refresh,1,0.000727000000000,0.000727000000000,0.000727000000000
RollingBloom-refresh,1,0.000728000000000,0.000728000000000,0.000728000000000
RollingBloom-refresh,1,0.000710000000000,0.000710000000000,0.000710000000000
RollingBloom-refresh,1,0.000740000000000,0.000740000000000,0.000740000000000
RollingBloom-refresh,1,0.000754000000000,0.000754000000000,0.000754000000000
RollingBloom,425984,0.000002242974006,0.000002634311386,0.000002363602667,0,0,0
SHA1,192,0.005361497402191,0.005662187933922,0.005399146427711,0,0,0
SHA256,128,0.008059620857239,0.008092522621155,0.008076656609774,0,0,0
SHA256_32b,2,0.612305998802185,0.612305998802185,0.612305998802185,0,0,0
SHA512,192,0.005607247352600,0.005650401115417,0.005613212784131,0,0,0
Scrypt,1920,0.000543218106031,0.000545129179955,0.000543585419655,0,0,0
SipHash_32b,16,0.068132996559143,0.068148016929626,0.068143382668495,0,0,0
Sleep100ms,10,0.100099921226501,0.100114583969116,0.100108289718628,0,0,0
Trig,14680064,0.000000050897597,0.000000073714773,0.000000070110803,0,0,0

win64 specs:

PS C:\Users\dakod> Get-CimInstance -Query 'Select * from Win32_Processor'

DeviceID Name                                      Caption                               MaxClockSpeed SocketDesignatio
                                                                                                       n
-------- ----                                      -------                               ------------- ----------------
CPU0     Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz Intel64 Family 6 Model 165 Stepping 2 2496          U3E1

win64 output:

PS C:\Users\dakod\source\repos> .\test_dogecoin.exe
Running 253 test cases...

*** No errors detected
PS C:\Users\dakod\source\repos> .\bench_dogecoin.exe
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,393216,0.000002500426490,0.000002689572284,0.000002603086007,6252,6828,6502
Base58Decode,720896,0.000001449003321,0.000001586915459,0.000001478715355,3613,3927,3692
Base58Encode,655360,0.000001586537110,0.000001648058969,0.000001619046088,4016,4111,4044
CCheckQueueSpeed,8192,0.000111327972263,0.000140612944961,0.000124877929920,280162,349229,311922
CCheckQueueSpeedPrevectorJob,768,0.001296859234571,0.001406222581863,0.001324917810659,3225043,3477352,3307092
CCoinsCaching,196608,0.000005371082807,0.000005529116606,0.000005466018289,13547,13774,13651
CoinSelection,832,0.001203157007694,0.001438535749912,0.001294508289832,3021586,3559221,3229346
LockedPool,640,0.001624882221222,0.001671858131886,0.001647884398699,4042072,4184125,4115312
MempoolEviction,40960,0.000025138200726,0.000026401365176,0.000025415088749,62712,65372,63391
RIPEMD160,512,0.001937404274940,0.001986905932426,0.001962081994861,4855528,4937830,4897861
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000999000000000,0.000999000000000,0.000999000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.001001000000000,0.001001000000000,0.001001000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.001000000000000,0.001000000000000,0.001000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000998000000000,0.000998000000000,0.000998000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom-refresh,1,0.000000000000000,0.000000000000000,0.000000000000000
RollingBloom,1703936,0.000000579315383,0.000000640884537,0.000000624447729,1465,1594,1559
SHA1,768,0.001281231641769,0.001547407358885,0.001317944067220,3218937,3829579,3288900
SHA256,416,0.002531222999096,0.002625249326229,0.002567310172778,6328474,6552884,6405013
SHA256_32b,6,0.184270501136780,0.186500072479248,0.185095349947611,460394078,464883856,462089531
SHA512,640,0.001625001430511,0.001688063144684,0.001670373231173,4113407,4228193,4171196
Scrypt,6144,0.000171858817339,0.000179719179869,0.000174083320114,432250,437575,434219
SipHash_32b,48,0.020484566688538,0.021250247955322,0.020875791708628,51642745,52907877,52108832
Sleep100ms,10,0.108206391334534,0.110309600830078,0.108905982971191,269698268,275213840,271766290
Trig,27262976,0.000000036223923,0.000000039983661,0.000000037047821,90,98,92

@patricklodder
Copy link
Member

armv82 error

I suspect that this is caused by other ARMv82 features. We can likely prevent this error by:

  1. Putting the ARMv82 implementation in its own file and then
  2. Compiling this into a separate library and only add -march=armv8.2-a+crypto+sha3 for that library

The same can be true for ARMv8... perhaps we can check this theory against a Cortex M7 or older with the ARMv8 binary?

@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch 4 times, most recently from 3bb5d1f to 6e45670 Compare November 28, 2023 04:59
@patricklodder patricklodder mentioned this pull request Feb 8, 2024
43 tasks
@slightlyskepticalpotat
Copy link
Contributor

a 64-bit Windows install on a device with a processor that doesn't support avx2 extensions
Do we still need testing on this platform?

@chromatic
Copy link
Member

a 64-bit Windows install on a device with a processor that doesn't support avx2 extensions
Do we still need testing on this platform?

I'd appreciate seeing the results!

@patricklodder patricklodder removed this from the 1.14.7 milestone Feb 16, 2024
@patricklodder patricklodder changed the base branch from 1.14.7-dev to 1.15.0-dev February 28, 2024 19:26
@patricklodder patricklodder added this to the 1.15.0 milestone Mar 7, 2024
added checks for Apple Clang compilation
added check for sha512 instruction
@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch from 6e45670 to 63c3266 Compare April 3, 2024 03:31
@patricklodder
Copy link
Member

Thanks for rebase ❤️

I had an item on my todo to ask for that. I'll rebuild test binaries shortly.

@patricklodder
Copy link
Member

I'm magically getting this error again on M1 Ventura with --with-armv82-crypto, investigating this.

  CXXLD    libdogecoinconsensus.la
Undefined symbols for architecture arm64:
  "sha512_armv82::Transform_ARMV82(unsigned long long*, unsigned char const*)", referenced from:
      detect_sha512_hardware() in libdogecoinconsensus_la-sha512.o
ld: symbol(s) not found for architecture arm64
  AR       libdogecoin_util.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [libdogecoinconsensus.la] Error 1

@edtubbs
Copy link
Contributor Author

edtubbs commented Apr 6, 2024

I'm magically getting this error again on M1 Ventura with --with-armv82-crypto, investigating this.

  CXXLD    libdogecoinconsensus.la
Undefined symbols for architecture arm64:
  "sha512_armv82::Transform_ARMV82(unsigned long long*, unsigned char const*)", referenced from:
      detect_sha512_hardware() in libdogecoinconsensus_la-sha512.o
ld: symbol(s) not found for architecture arm64
  AR       libdogecoin_util.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [libdogecoinconsensus.la] Error 1

That's interesting. Ok, I'll take another look at the Makefile targets too.

@patricklodder
Copy link
Member

I was thinking maybe I'll just implement the generic capability checker that I had selected for after this PR now, integrate it with SSE2 as a template and then we can reuse it here. Shouldn't take me more than this weekend to get that in a PR form.

@edtubbs
Copy link
Contributor Author

edtubbs commented Apr 6, 2024

I was thinking maybe I'll just implement the generic capability checker that I had selected for after this PR now, integrate it with SSE2 as a template and then we can reuse it here. Shouldn't take me more than this weekend to get that in a PR form.

Ok, cool.

@patricklodder
Copy link
Member

@edtubbs I had to do some Makefile.am work on top of the #3512 struct to make it work for libdogecoinconsensus.la for this PR.

However, I then found a potentially cleaner concept, though a bit more verbose, in bitcoin/bitcoin@c1e16cb, which, if done indiscriminately, seems to solve the issue we ran into at the very beginning on M1, by just clearly specifying each module's parametrization and compiling them into a static .la.

I still have to do cleanup and test it on some more architectures than just linux x86_64 and macOS arm64, but this actually looks promising.

@edtubbs
Copy link
Contributor Author

edtubbs commented Apr 7, 2024

@edtubbs I had to do some Makefile.am work on top of the #3512 struct to make it work for libdogecoinconsensus.la for this PR.

However, I then found a potentially cleaner concept, though a bit more verbose, in bitcoin/bitcoin@c1e16cb, which, if done indiscriminately, seems to solve the issue we ran into at the very beginning on M1, by just clearly specifying each module's parametrization and compiling them into a static .la.

I still have to do cleanup and test it on some more architectures than just linux x86_64 and macOS arm64, but this actually looks promising.

That's a good approach, I can go ahead and update crypto base similarly here.

@patricklodder
Copy link
Member

patricklodder commented Apr 7, 2024

I can go ahead and update crypto base similarly here.

Yes, that works. May even be able to just do this one first then and do the generic selectors as a second step. Probably still faster to merge this one first.

The best results for me thus far have been:

  1. crypto/libdogecoin_crypto_base.a -> crypto/libdogecoin_crypto_base.la and add -static to LD and CXX flags
  2. A separate .la (also with -static per that commit) for each optional module (right now scrypt-sse2 and sha512-armv82)
  3. LIBDOGECOIN_CRYPTO is a list of base and the separate modules (that get configured)
  4. Instead of recompiling base sources under libdogecoinconsensus_la_SOURCES, just add LIBDOGECOIN_CRYPTO to libdogecoinconsensus_la_LIBADD, like done for LIBSECP256K1.

updated to build a shared library archive
@edtubbs edtubbs force-pushed the 1.14.7-dev-runtime-guard branch from 63c3266 to a8049b6 Compare April 9, 2024 16:47
@edtubbs
Copy link
Contributor Author

edtubbs commented Apr 9, 2024

Built and tested on ARMv8.2 with linux.

@patricklodder
Copy link
Member

Confirmed that this fixed the link error on arm64 MacOS. Will now re-build test binaries for all platforms.

@patricklodder patricklodder changed the base branch from 1.15.0-dev to master August 9, 2024 00:48
@edtubbs
Copy link
Contributor Author

edtubbs commented Sep 21, 2024

Will now re-build test binaries for all platforms.

Do you have the results of this? Thanks.

@patricklodder patricklodder removed this from the 1.15.0 milestone Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🚀 needs review
Development

Successfully merging this pull request may close these issues.

5 participants