This repository provides code for our faster Kyber implementation on three low-end 32-bit IoT devices: the ARM Cortex-M3, SiFive E310 board, and PQRISCV. Authors:
- Junhao Huang
<huangjunhao@uic.edu.cn>
- Haosong Zhao
<zhaohaosonguic@gmail.com>
- Jipeng Zhang
<jp-zhang@outlook.com>
- Wangchen Dai
<w.dai@my.cityu.edu.hk>
- Lu Zhou
<lu.zhou@nuaa.edu.cn>
- Ray C. C. Cheung
<r.cheung@cityu.edu.hk>
- Çetin Kaya Koç
<cetinkoc@ucsb.edu>
- Donglong Chen
<donglongchen@uic.edu.cn>
(Corresponding Author)
git clone --recursive https://github.com/UIC-ESLAS/Kyber_RV_M3.git
The setup for testing and evaluating our code on the ARM Cortex-M3 is based on the framework provided in the pqm3 project. We based our implementation mostly on this repository. However, we could not use the benchmarks.py
, test.py
, testvectors.py
scripts presented in this repository in our environment. Therefore, we wrote three new scripts new_benchmarks.py
, new_poly_benchmarks.py
, new_stack_benchmarks.py
for benchmarking our implementations.
arm-none-eabi-gcc
: version 10.2.1Bossa
: version 1.9.1libopencm3
: commit5617ed466444790b787b6df8d7f21d1611905fd1
from libopencm3python3
with the packagespyserial
andnumpy
(only required for the evaluation scripts);{pyserial-}miniterm
is used to read the output from Arduino- Hardware:
Arduino Due
development board with sam3x8e
Detailed instructions on interacting with the hardware and on installing required software can be found in pqm3's readme.
The scripts new_benchmarks.py
, new_poly_benchmarks.py
, and new_stack_benchmarks.py
cover the benchmarks in our paper.
In case separate, manual testing is required, the binaries for a scheme can be build using
make PLATFORM=sam3x8e bin/crypto_kem_{scheme}_{variant}_{firmware}.bin
where scheme
can be one of the {kyber512, kyber768, kyber1024}
, variant
belongs to {m3, m3fspeed, m3fstack}
, firmware
is one of {test, testvectors, speed, stack}
.
For building, flashing and evaluating the testvectors
firmware for our stack-version of kyber768
, the following command can be used.
make PLATFORM=sam3x8e ./bin/crypto_kem_kyber768_m3fstack_testvectors.bin
# (You might need to run `make clean` first, if you previously built for a different platform.)
# Flash the binary using bossac.
bossac -a --erase --write --verify --boot=1 --port=/dev/ttyACM0 ./bin/crypto_kem_kyber768_m3fstack_testvectors.bin
# Open the serial monitor.
{pyserial-}miniterm /dev/ttyACM0
We followed the experimental setup in SiFive Freedom-E-SDK and Saber_RV32.
RISC-V GNU toolchain
: version 10.2.0Segger J-LINK
: flashes the binary to the boardpython3
with the packagespyserial
andnumpy
(only required for the evaluation scripts)- Hardware:
SiFive Freedom E310
development board with a 32-bit RISC-V CPU
The scripts benchmarks.py
and stack_benchmarks.py
cover the benchmarks in our paper.
In case separate, manual testing is required, the binaries for a scheme can be build using
make clean
# compile code in CRYPTO_PATH and firmware for CRYPTO_ITERATIONS times.
make CRYPTO_PATH=crypto_kem/{scheme}/{variant} bin/crypto_kem_{scheme}_{variant}_{firmware}.hex {CRYPTO_ITERATIONS=100}
# You can flash the binary to board in the following two ways
1. make run bin/crypto_kem_{scheme}_{variant}_{firmware}.hex
2. ./jlink.sh --hex bin/crypto_kem_{scheme}_{variant}_{firmware}.hex --jlink JLinkExe
where scheme
can be one of the {kyber512, kyber768, kyber1024}
, variant
belongs to {fspeed, fstack}
, firmware
is one of {test, testvectors, speed, stack}
.
For building, flashing and evaluating the testvectors
firmware for our stack-version of kyber768
the following command can be used:
make clean && make CRYPTO_PATH=crypto_kem/kyber768/fstack crypto_kem_kyber768_fstack_testvectors.bin CRYPTO_ITERATIONS=2
# Flash the binary jlink.
1. make run bin/crypto_kem_kyber768_fstack_testvectors.hex
2. ./jlink.sh --hex bin/crypto_kem_kyber768_fstack_testvectors.hex --jlink JLinkExe
# Open the serial monitor.
python3 listen.py
riscv64-unknown-elf-nm bin/crypto_kem_kyber768_fstack_speed.elf --print-size --size-sort --radix=d | \
grep -v '\<_\|\<metal\|\<pll_configs' | \
awk '{sum+=2;print2 ; print 0} END{print "Total size =", sum, "bytes =", sum/1024, "kB"}'
We followed the experimental setup in Kyber_RISC_V_Thesis, PQRISCV, and PQRISCV-VEXRISCV.
RISC-V GNU toolchain
: version 10.2.0jdk>1.8.0
sbt
verilator
python3
with the packagespyserial
andnumpy
(only required for the evaluation scripts)- Hardware:
PQRISCV simulator
PQRISCV-VEXRISCV
Manual testing is required to obtain the benchmarks in this paper, the binaries for a scheme can be build using
make -f makefile_vexrv.mk clean
# compile code in CRYPTO_PATH and firmware for CRYPTO_ITERATIONS times.
make -f makefile_vexrv.mk CRYPTO_PATH=crypto_kem/{scheme}/{variant} bin/crypto_kem_{scheme}_{variant}_{firmware}.bin {CRYPTO_ITERATIONS=100}
# Go to the pqriscv-vexriscv directory and run the following;
# Flash the binary using sbt.
sbt "runMain mupq.PQVexRiscvSim --init ../Kyber_RV_M3/RISC-V/bin/crypto_kem_{scheme}_{variant}_{firmware}.bin"
where scheme
can be one of the {kyber512, kyber768, kyber1024}
, variant
belongs to {fspeed, fstack}
, firmware
is one of {test, stack, speed_vexrv, testvectors}
.
For building, flashing and evaluating the testvectors
firmware for our stack-version of kyber768
the following command can be used:
make -f makefile_vexrv.mk CRYPTO_PATH=crypto_kem/kyber768/fstack CRYPTO_ITERATIONS=2 testvectors
# Flash the binary using sbt.
sbt "runMain mupq.PQVexRiscvSim --init ../Kyber_RV_M3/RISC-V/bin/testvectors.bin"
# Open the serial monitor.
python3 listen.py
The following files are the main files we used in this paper
pqm3
: implementation on Cortex-M3common
: contains code that is shared between different schemesconfig.py
: saves platform configurationcrypto_kem
: contains the implementations for kyber512, kyber768, kyber1024kyber512
:m3
: the original implementation with the Montgomery arithmetic presented in pqm3m3fspeed
: the high-speed version (speed-version) implementation with the Plantard arithmetic.m3fstack
: the stack-friendly version (stack-version) implementation with the Plantard arithmetic.
kyber768
:m3
: the original implementation with Montgomery arithmetic presented in pqm3m3fspeed
:the high-speed version (speed-version) implementation with the Plantard arithmetic.m3fstack
: the stack-friendly version (stack-version) implementation with the Plantard arithmetic.
kyber1024
:m3
: the original implementation with Montgomery arithmetic presented in pqm3m3fspeed
: the high-speed version (speed-version) implementation with the Plantard arithmetic.m3fstack
: the stack-friendly version (stack-version) implementation with the Plantard arithmetic.
Makefile
: Makefile to build the code in pqm3new_benchmarks.py
: This script is used for building, flashing, and evaluating the outputs produced bymupq/crypto_kem/speed.c
. The desired algorithms as well as the number of iterations can be set in the code. The output is stored innew_benchmarks.txt
new_poly_benchmarks.py
: The original pqm3 does not provide code for benchmarking the polynomial arithmetic like NTT, INTT, and base multiplication. We need to modify themupq/crypto_kem/speed.c
so that it provides results for these operations. This script is used for building, flashing, and evaluating the outputs produced bymupq/crypto_kem/speed.c
. The desired algorithms as well as the number of iterations can be set in the code. The output is stored innew_poly_benchmarks.txt
new_stack_benchmarks.py
: This script is used for building, flashing, and evaluating the outputs produced bymupq/crypto_kem/stack.c
. The desired algorithms as well as the number of iterations can be set in the code. The output is stored innew_stack_benchmarks.txt
RISC-V
: implementation on RISC-Vbenchmark
: contains benchmark filesbsp
: contains board support packageconfig.py
: saves platform configurationcommon
: contains code that is shared between different schemescrypto_kem
: contains the implementations for kyber512, kyber768, kyber1024kyber512
:fspeed
: the high-speed version (speed-version) implementation with the Plantard arithmetic.fstack
: the stack-friendly version (stack-version) implementation with the Plantard arithmetic.
kyber768
:fspeed
:the high-speed version (speed-version) implementation with the Plantard arithmetic.fstack
: the stack-friendly version (stack-version) implementation with the Plantard arithmetic.
kyber1024
:fspeed
: the high-speed version (speed-version) implementation with the Plantard arithmetic.fstack
: the stack-friendly version (stack-version) implementation with the Plantard arithmetic.
Makefile
: Makefile to build the code for the SiFive boardmakefile_vexrv.mk
: Makefile to build the code for PQRISCVbenchmarks.py
: This script is used for building, flashing, and evaluating the outputs produced bybenchmark/speed.c
. The desired algorithms as well as the number of iterations can be set in the code. The output is stored inbenchmarks.txt
stack_benchmarks.py
: This script is used for building, flashing, and evaluating the outputs produced bybenchmark/stack.c
. The desired algorithms as well as the number of iterations can be set in the code. The output is stored instack_benchmarks.txt
results_vexriscv.txt
: results for PQRISCVlisten.py
: receives output from the SiFive boardjlink.sh
: flashes the binary to the SiFive board
Each subdirectory containing implementations contains a LICENSE or COPYING file stating under what license that specific implementation is released. The files in common contain licensing information at the top of the file (and are currently either public domain or MIT). All other code in this repository is licensed under the conditions of Apache-2.0.