Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid non-trivial global constants in SHA-NI code #18553

Merged
merged 1 commit into from
Apr 22, 2020

Conversation

sipa
Copy link
Member

@sipa sipa commented Apr 7, 2020

This is a potential solution for #18456.

It seems that the compiler cannot turn _mm_set_epi64x(<constant>,<constnant>) into a constant itself, and thus emits a global initializer for the MASK, INIT0, and INIT1 global constants in the sha-ni SHA256 implementation.

Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

src/crypto/sha256_shani.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@elichai elichai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the source of this function: https://github.com/gcc-mirror/gcc/blob/4a3895903c29ed85da6fcb886f31ff23d4c6e935/gcc/config/i386/emmintrin.h#L694
It seems to be doing just a read operation from that location, so I'm not too worried about the performance of this

And looking at godbolt you seem to be right, it initializes it in the global initializers https://godbolt.org/z/M2b6cg

@maflcko
Copy link
Member

maflcko commented Apr 7, 2020

Alternative suggested on IRC, which might compile on clang:

[13:29] <aj> +inline __m128i MASK() { return _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL); }

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 7, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa
Copy link
Member Author

sipa commented Apr 7, 2020

@MarcoFalke Going to try #18553 (comment) first.

@sipa sipa force-pushed the 202004_no_global_init_shani branch from d4d9a69 to 8508473 Compare April 7, 2020 20:57
@elichai
Copy link
Contributor

elichai commented Apr 8, 2020

Looks Good.
Can someone with SHA-NI run this under valgrind and sanitizers? I'm a bit afraid this violates some weird C++ rule (I find manually modifying alignments quite scary)

@maflcko maflcko added this to the 0.20.0 milestone Apr 8, 2020
@laanwj
Copy link
Member

laanwj commented Apr 8, 2020

Code review ACK 8508473

@laanwj
Copy link
Member

laanwj commented Apr 9, 2020

I've diffed the assembly code generated by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0. The only difference with this patch seems to be that the following initialization code is removed:

--- ./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.old_d  2020-04-09 14:01:37.073839875 +0200
+++ ./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.new_d  2020-04-09 14:01:43.009818616 +0200
@@ -1,5 +1,5 @@
 
-./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.old:     file format elf64-x86-64
+./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.new:     file format elf64-x86-64
 
 
 Disassembly of section .text:
@@ -1163,28 +1163,3 @@
     1502:      48 83 c4 68             add    $0x68,%rsp
     1506:      c3                      retq   
     1507:      e8 00 00 00 00          callq  150c <_ZN15sha256d64_shani14Transform_2wayEPhPKh+0x115c>
-
-Disassembly of section .text.startup:
-
-0000000000000000 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm>:
-   0:  48 83 ec 18             sub    $0x18,%rsp
-   4:  66 0f 6f 05 00 00 00    movdqa 0x0(%rip),%xmm0        # c <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0xc>
-   b:  00 
-   c:  64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
-  13:  00 00 
-  15:  48 89 44 24 08          mov    %rax,0x8(%rsp)
-  1a:  31 c0                   xor    %eax,%eax
-  1c:  0f 29 05 00 00 00 00    movaps %xmm0,0x0(%rip)        # 23 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x23>
-  23:  48 8b 44 24 08          mov    0x8(%rsp),%rax
-  28:  64 48 33 04 25 28 00    xor    %fs:0x28,%rax
-  2f:  00 00 
-  31:  66 0f 6f 05 00 00 00    movdqa 0x0(%rip),%xmm0        # 39 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x39>
-  38:  00 
-  39:  0f 29 05 00 00 00 00    movaps %xmm0,0x0(%rip)        # 40 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x40>
-  40:  66 0f 6f 05 00 00 00    movdqa 0x0(%rip),%xmm0        # 48 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x48>
-  47:  00 
-  48:  0f 29 05 00 00 00 00    movaps %xmm0,0x0(%rip)        # 4f <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x4f>
-  4f:  75 05                   jne    56 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x56>
-  51:  48 83 c4 18             add    $0x18,%rsp
-  55:  c3                      retq   
-  56:  e8 00 00 00 00          callq  5b <.LC5+0xb>

This means that there is no extra overhead, at least with this compiler.

@maflcko
Copy link
Member

maflcko commented Apr 9, 2020

@rebroad This should fix your issue #18456

@theuni
Copy link
Member

theuni commented Apr 9, 2020

LGTM.

@MarcoFalke does clang have problems with the fix here? If so, @ajtowns's solution could be tweaked to be even simpler:

diff --git a/src/crypto/sha256_shani.cpp b/src/crypto/sha256_shani.cpp
index 92f67710fb..b139bceb37 100644
--- a/src/crypto/sha256_shani.cpp
+++ b/src/crypto/sha256_shani.cpp
@@ -15,9 +15,9 @@

 namespace {

-const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);
-const __m128i INIT0 = _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull);
-const __m128i INIT1 = _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull);
+#define MASK _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL)
+#define INIT0 _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull)
+#define INIT1  _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull)

 void inline  __attribute__((always_inline)) QuadRound(__m128i& state0, __m128i& state1, uint64_t k1, uint64_t k0)
 {

I assume the compiler would be smart enough to avoid actually doing the load each time.

@maflcko
Copy link
Member

maflcko commented Apr 9, 2020

Clang on travis is green now

@theuni
Copy link
Member

theuni commented Apr 9, 2020

Sidenote: we could potentially add a check for illegal instructions in the .text.startup section in one of our python binary checking tools. Though we'd have to create a per-arch blacklist/whitelist to define "illegal".

@laanwj
Copy link
Member

laanwj commented Apr 9, 2020

Sidenote: we could potentially add a check for illegal instructions in the .text.startup section in one of our python binary checking tools. Though we'd have to create a per-arch blacklist/whitelist to define "illegal".

My idea was to forbid .text.startup sections completely in all 'special' compilation units, e.g. those compiled with non-default instruction sets. I think that's easier to implement a check for than instruction white/blacklists.

(in any case, not for this PR)

@theuni
Copy link
Member

theuni commented Apr 9, 2020

@laanwj Aha, even simpler.

Edit: And yes, I certainly meant that this could be done as a follow-up, not for this PR.

@elichai
Copy link
Contributor

elichai commented Apr 12, 2020

Sounds like it solved the problem: #18456 (comment)

ACK 8508473

@laanwj laanwj merged commit ce4e1f0 into bitcoin:master Apr 22, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 23, 2020
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)

Pull request description:

  This is a potential solution for bitcoin#18456.

  It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation.

  Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

  Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

ACKs for top commit:
  laanwj:
    Code review ACK 8508473
  elichai:
    ACK 8508473

Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Apr 23, 2020
@fanquake fanquake mentioned this pull request Apr 23, 2020
laanwj added a commit that referenced this pull request May 11, 2020
7f7548d rpc: Do not advertise dumptxoutset as a way to flush the chainstate (MarcoFalke)
a9ca65b Fix naming of macOS SDK and clarify version (Andrew Chow)
54d2063 Do not expose and consider -logthreadnames when it does not work (Hennadii Stepanov)
6986b26 build: fix ASLR for bitcoin-cli on Windows (fanquake)
1d1e358 build: Set libevent minimum version to 2.0.21 (Hennadii Stepanov)
842b13a Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)
ade4185 gitian: Add missing automake package to gitian-win-signer.yml (Andrew Chow)

Pull request description:

  Currently backports the following to the 0.20 branch:

  * #18598 - gitian: Add missing automake package to gitian-win-signer.yml
  * #18702 - build: fix ASLR for bitcoin-cli on Windows
  * #18676 - build: Check libevent minimum version in configure script
  * #18665 - Do not expose and consider -logthreadnames when it does not work
  * #18553 - Avoid non-trivial global constants in SHA-NI code
  * #18589 - Fix naming of macOS SDK and clarify version

ACKs for top commit:
  laanwj:
    ACK 7f7548d

Tree-SHA512: 2cba748414a440e3fb901940085a7ae059e8b926c9187fbbbdeb7846a32e7374f318cc21e499c911ff803c42aef2c844b04af10b87f9c5a2b3edf6deb03ebb04
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 28, 2020
Summary:
```
This is a potential solution for #18456.

It seems that the compiler cannot turn
_mm_set_epi64x(<constant>,<constnant>) into a constant itself, and thus
emits a global initializer for the MASK, INIT0, and INIT1 global
constants in the sha-ni SHA256 implementation.

Change this by turning them into dumb byte arrays, loading them into an
SSE variable whenever needed.

Tested on a SHA-NI capable machine. I do not observe any obvious
performance impact (but this is hard to measure, it's already very
fast...).
```

Backport of core [[bitcoin/bitcoin#18553 | PR18553]].

Test Plan:
  cmake -GNinja .. -DENABLE_SSE41=OFF -DENABLE_AVX2=OFF
  ninja all check bench-bitcoin

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D6278
ftrader pushed a commit to bitcoin-cash-node/bitcoin-cash-node that referenced this pull request Aug 17, 2020
Summary:
```
This is a potential solution for #18456.

It seems that the compiler cannot turn
_mm_set_epi64x(<constant>,<constnant>) into a constant itself, and thus
emits a global initializer for the MASK, INIT0, and INIT1 global
constants in the sha-ni SHA256 implementation.

Change this by turning them into dumb byte arrays, loading them into an
SSE variable whenever needed.

Tested on a SHA-NI capable machine. I do not observe any obvious
performance impact (but this is hard to measure, it's already very
fast...).
```

Backport of core [[bitcoin/bitcoin#18553 | PR18553]].

Test Plan:
  cmake -GNinja .. -DENABLE_SSE41=OFF -DENABLE_AVX2=OFF
  ninja all check bench-bitcoin

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D6278
backpacker69 referenced this pull request in peercoin/peercoin Mar 28, 2021
codeofalltrades added a commit to Veil-Project/veil that referenced this pull request May 27, 2021
9c98451 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)

Pull request description:

  This is a cherry-pick of bitcoin/bitcoin#18553 fixing bitcoin/bitcoin#18456 which suddenly cropped up on my linux 18.04 machine.

  ### Problem
  Certain linux configurations may encounter a `SIGILL` Illegal Instruction that causes a locally-built veild to crash immediately. Using gdb to catch the stack trace will give a reference to global variable initialization in the sha256_shani.cpp file. Root cause appears to be a global constant is not produced correctly by the compiler as a constant.

  ### Solution
  Cherry-pick bitcoin's solution to the same problem.

  ### Testing
  Verified that veild (regtest) is now able to run on the machine where this issue occurred.

Tree-SHA512: 8e3a44aef53c2545c6eb09bbd8f25f6fe1ff8c46edfbb529bf4a03f4024b46b3891544457d50acc2054a49efd42bbea3768fdeaa08fcfb293b5086b441919ba4
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Dec 22, 2021
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)

Pull request description:

  This is a potential solution for bitcoin#18456.

  It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation.

  Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

  Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

ACKs for top commit:
  laanwj:
    Code review ACK 8508473
  elichai:
    ACK 8508473

Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Dec 22, 2021
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)

Pull request description:

  This is a potential solution for bitcoin#18456.

  It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation.

  Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

  Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

ACKs for top commit:
  laanwj:
    Code review ACK 8508473
  elichai:
    ACK 8508473

Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Dec 22, 2021
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)

Pull request description:

  This is a potential solution for bitcoin#18456.

  It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation.

  Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

  Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

ACKs for top commit:
  laanwj:
    Code review ACK 8508473
  elichai:
    ACK 8508473

Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Dec 28, 2021
8508473 Avoid non-trivial global constants in SHA-NI code (Pieter Wuille)

Pull request description:

  This is a potential solution for bitcoin#18456.

  It seems that the compiler cannot turn `_mm_set_epi64x(<constant>,<constnant>)` into a constant itself, and thus emits a global initializer for the `MASK`, `INIT0`, and `INIT1` global constants in the sha-ni SHA256 implementation.

  Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

  Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

ACKs for top commit:
  laanwj:
    Code review ACK 8508473
  elichai:
    ACK 8508473

Tree-SHA512: 07049cf1a33624c22df2be48b814d5636c037b368861eb13ee073bdce2b7c902a56e96518218961f55a2a1631a40825ded6dbbc28d7fe0e7fec267d704e39112
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bitcoin-qt/bitcoind crashes upon startup of crypto/sha256_shani. Illegal instruction
7 participants