Skip to content

Commit

Permalink
crypto_aesctr_aesni: workaround for missing _mm_loadu_si64
Browse files Browse the repository at this point in the history
As it happens, both intrinsics have the same "Operation" description in
the Intel Intrincs Guide [1] (provided that we interpret the "MAX" byte
as 127, which I think is fair).

_mm_loadu_si64:
    dst[63:0] := MEM[mem_addr+63:mem_addr]
    dst[MAX:64] := 0

_mm_load_sd:
    dst[63:0] := MEM[mem_addr+63:mem_addr]
    dst[127:64] := 0

[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2&expand=3340,3340,3421,3421,3340,3421,3340&cats=Load

The actual assembly instructions vary, as do the "description" fields --
for _mm_load_si64, there's no textual definition of the upper 64 bits.

Interestingly, it looks like gcc7 and gcc8 both compile our load_64()
function into movq (which is _mm_loadu_si64), rather than movsd (which
is _mm_load_sd).
  • Loading branch information
gperciva committed Feb 23, 2021
1 parent fbbdf3c commit 9190db8
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions libcperciva/crypto/crypto_aesctr_aesni.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
*/
#include "crypto_aesctr_shared.c"

#ifdef BROKEN_MM_LOADU_SI64
#warning Working around compiler bug: _mm_loadu_si64 is missing
#warning Updating to a newer compiler may improve performance
#endif

/**
* load_si64(mem):
* Load an unaligned 64-bit integer from memory into the lowest 64 bits of the
Expand All @@ -39,7 +44,11 @@ static inline __m128i
load_si64(const void * mem)
{

#ifdef BROKEN_MM_LOADU_SI64
return (_mm_castpd_si128(_mm_load_sd(mem)));
#else
return (_mm_loadu_si64(mem));
#endif
}

/* Process multiple whole blocks by generating & using a cipherblock. */
Expand Down

0 comments on commit 9190db8

Please sign in to comment.