Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libcperciva import #306

Merged
merged 3 commits into from
Feb 24, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
crypto_aesctr_aesni: workaround for missing _mm_loadu_si64
As it happens, both intrinsics have the same "Operation" description in
the Intel Intrincs Guide [1] (provided that we interpret the "MAX" byte
as 127, which I think is fair).

_mm_loadu_si64:
    dst[63:0] := MEM[mem_addr+63:mem_addr]
    dst[MAX:64] := 0

_mm_load_sd:
    dst[63:0] := MEM[mem_addr+63:mem_addr]
    dst[127:64] := 0

[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2&expand=3340,3340,3421,3421,3340,3421,3340&cats=Load

The actual assembly instructions vary, as do the "description" fields --
for _mm_load_si64, there's no textual definition of the upper 64 bits.

Interestingly, it looks like gcc7 and gcc8 both compile our load_64()
function into movq (which is _mm_loadu_si64), rather than movsd (which
is _mm_load_sd).
  • Loading branch information
gperciva committed Feb 23, 2021
commit 9190db8abaf6de4cffe45d4e4f078ffc0177e014
9 changes: 9 additions & 0 deletions libcperciva/crypto/crypto_aesctr_aesni.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
*/
#include "crypto_aesctr_shared.c"

#ifdef BROKEN_MM_LOADU_SI64
#warning Working around compiler bug: _mm_loadu_si64 is missing
#warning Updating to a newer compiler may improve performance
#endif

/**
* load_si64(mem):
* Load an unaligned 64-bit integer from memory into the lowest 64 bits of the
Expand All @@ -39,7 +44,11 @@ static inline __m128i
load_si64(const void * mem)
{

#ifdef BROKEN_MM_LOADU_SI64
return (_mm_castpd_si128(_mm_load_sd(mem)));
#else
return (_mm_loadu_si64(mem));
#endif
}

/* Process multiple whole blocks by generating & using a cipherblock. */
Expand Down