-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid non-trivial global constants in SHA-NI code #18553
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the source of this function: https://github.com/gcc-mirror/gcc/blob/4a3895903c29ed85da6fcb886f31ff23d4c6e935/gcc/config/i386/emmintrin.h#L694
It seems to be doing just a read operation from that location, so I'm not too worried about the performance of this
And looking at godbolt you seem to be right, it initializes it in the global initializers https://godbolt.org/z/M2b6cg
Alternative suggested on IRC, which might compile on clang: [13:29] <aj> +inline __m128i MASK() { return _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL); } |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
e0e42b7
to
d4d9a69
Compare
@MarcoFalke Going to try #18553 (comment) first. |
d4d9a69
to
8508473
Compare
Looks Good. |
Code review ACK 8508473 |
I've diffed the assembly code generated by --- ./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.old_d 2020-04-09 14:01:37.073839875 +0200
+++ ./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.new_d 2020-04-09 14:01:43.009818616 +0200
@@ -1,5 +1,5 @@
-./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.old: file format elf64-x86-64
+./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.new: file format elf64-x86-64
Disassembly of section .text:
@@ -1163,28 +1163,3 @@
1502: 48 83 c4 68 add $0x68,%rsp
1506: c3 retq
1507: e8 00 00 00 00 callq 150c <_ZN15sha256d64_shani14Transform_2wayEPhPKh+0x115c>
-
-Disassembly of section .text.startup:
-
-0000000000000000 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm>:
- 0: 48 83 ec 18 sub $0x18,%rsp
- 4: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # c <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0xc>
- b: 00
- c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
- 13: 00 00
- 15: 48 89 44 24 08 mov %rax,0x8(%rsp)
- 1a: 31 c0 xor %eax,%eax
- 1c: 0f 29 05 00 00 00 00 movaps %xmm0,0x0(%rip) # 23 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x23>
- 23: 48 8b 44 24 08 mov 0x8(%rsp),%rax
- 28: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
- 2f: 00 00
- 31: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # 39 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x39>
- 38: 00
- 39: 0f 29 05 00 00 00 00 movaps %xmm0,0x0(%rip) # 40 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x40>
- 40: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # 48 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x48>
- 47: 00
- 48: 0f 29 05 00 00 00 00 movaps %xmm0,0x0(%rip) # 4f <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x4f>
- 4f: 75 05 jne 56 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x56>
- 51: 48 83 c4 18 add $0x18,%rsp
- 55: c3 retq
- 56: e8 00 00 00 00 callq 5b <.LC5+0xb> This means that there is no extra overhead, at least with this compiler. |
LGTM. @MarcoFalke does clang have problems with the fix here? If so, @ajtowns's solution could be tweaked to be even simpler: diff --git a/src/crypto/sha256_shani.cpp b/src/crypto/sha256_shani.cpp
index 92f67710fb..b139bceb37 100644
--- a/src/crypto/sha256_shani.cpp
+++ b/src/crypto/sha256_shani.cpp
@@ -15,9 +15,9 @@
namespace {
-const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);
-const __m128i INIT0 = _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull);
-const __m128i INIT1 = _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull);
+#define MASK _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL)
+#define INIT0 _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull)
+#define INIT1 _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull)
void inline __attribute__((always_inline)) QuadRound(__m128i& state0, __m128i& state1, uint64_t k1, uint64_t k0)
{ I assume the compiler would be smart enough to avoid actually doing the load each time. |
Clang on travis is green now |
Sidenote: we could potentially add a check for illegal instructions in the .text.startup section in one of our python binary checking tools. Though we'd have to create a per-arch blacklist/whitelist to define "illegal". |
My idea was to forbid (in any case, not for this PR) |
@laanwj Aha, even simpler. Edit: And yes, I certainly meant that this could be done as a follow-up, not for this PR. |
Sounds like it solved the problem: #18456 (comment) ACK 8508473 |
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) Pull request description: This is a potential solution for bitcoin#18456. It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ACKs for top commit: laanwj: Code review ACK 8508473 elichai: ACK 8508473 Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
Github-Pull: bitcoin#18553 Rebased-From: 8508473
7f7548d rpc: Do not advertise dumptxoutset as a way to flush the chainstate (MarcoFalke) a9ca65b Fix naming of macOS SDK and clarify version (Andrew Chow) 54d2063 Do not expose and consider -logthreadnames when it does not work (Hennadii Stepanov) 6986b26 build: fix ASLR for bitcoin-cli on Windows (fanquake) 1d1e358 build: Set libevent minimum version to 2.0.21 (Hennadii Stepanov) 842b13a Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) ade4185 gitian: Add missing automake package to gitian-win-signer.yml (Andrew Chow) Pull request description: Currently backports the following to the 0.20 branch: * #18598 - gitian: Add missing automake package to gitian-win-signer.yml * #18702 - build: fix ASLR for bitcoin-cli on Windows * #18676 - build: Check libevent minimum version in configure script * #18665 - Do not expose and consider -logthreadnames when it does not work * #18553 - Avoid non-trivial global constants in SHA-NI code * #18589 - Fix naming of macOS SDK and clarify version ACKs for top commit: laanwj: ACK 7f7548d Tree-SHA512: 2cba748414a440e3fb901940085a7ae059e8b926c9187fbbbdeb7846a32e7374f318cc21e499c911ff803c42aef2c844b04af10b87f9c5a2b3edf6deb03ebb04
Summary: ``` This is a potential solution for #18456. It seems that the compiler cannot turn _mm_set_epi64x(<constant>,<constnant>) into a constant itself, and thus emits a global initializer for the MASK, INIT0, and INIT1 global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ``` Backport of core [[bitcoin/bitcoin#18553 | PR18553]]. Test Plan: cmake -GNinja .. -DENABLE_SSE41=OFF -DENABLE_AVX2=OFF ninja all check bench-bitcoin Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D6278
Summary: ``` This is a potential solution for #18456. It seems that the compiler cannot turn _mm_set_epi64x(<constant>,<constnant>) into a constant itself, and thus emits a global initializer for the MASK, INIT0, and INIT1 global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ``` Backport of core [[bitcoin/bitcoin#18553 | PR18553]]. Test Plan: cmake -GNinja .. -DENABLE_SSE41=OFF -DENABLE_AVX2=OFF ninja all check bench-bitcoin Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D6278
Github-Pull: #18553 Rebased-From: 8508473
9c98451 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) Pull request description: This is a cherry-pick of bitcoin/bitcoin#18553 fixing bitcoin/bitcoin#18456 which suddenly cropped up on my linux 18.04 machine. ### Problem Certain linux configurations may encounter a `SIGILL` Illegal Instruction that causes a locally-built veild to crash immediately. Using gdb to catch the stack trace will give a reference to global variable initialization in the sha256_shani.cpp file. Root cause appears to be a global constant is not produced correctly by the compiler as a constant. ### Solution Cherry-pick bitcoin's solution to the same problem. ### Testing Verified that veild (regtest) is now able to run on the machine where this issue occurred. Tree-SHA512: 8e3a44aef53c2545c6eb09bbd8f25f6fe1ff8c46edfbb529bf4a03f4024b46b3891544457d50acc2054a49efd42bbea3768fdeaa08fcfb293b5086b441919ba4
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) Pull request description: This is a potential solution for bitcoin#18456. It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ACKs for top commit: laanwj: Code review ACK 8508473 elichai: ACK 8508473 Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) Pull request description: This is a potential solution for bitcoin#18456. It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ACKs for top commit: laanwj: Code review ACK 8508473 elichai: ACK 8508473 Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) Pull request description: This is a potential solution for bitcoin#18456. It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ACKs for top commit: laanwj: Code review ACK 8508473 elichai: ACK 8508473 Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille) Pull request description: This is a potential solution for bitcoin#18456. It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation. Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed. Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...). ACKs for top commit: laanwj: Code review ACK 8508473 elichai: ACK 8508473 Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
This is a potential solution for #18456.
It seems that the compiler cannot turn
_mm_set_epi64x(<constant>,<constnant>)
into a constant itself, and thus emits a global initializer for theMASK
,INIT0
, andINIT1
global constants in the sha-ni SHA256 implementation.Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.
Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).