摘要
量子计算技术快速发展带来的新挑战使得后量子密码(post-quantum cryptography,PQC)成为当前密码学界研究热点.基于格的密码方案因其安全高效的特性,已经成为后量子公钥密码的主流之一.Aigis密钥封装算法(Aigis-enc)是我国学者自主设计的基于模格上非对称错误学习(A-MLWE)问题的后量子密码算法,是中国密码学会举办的全国密码算法设计竞赛公钥密码算法一等奖获奖算法之一.为了应对量子攻击,维护国家网络空间的长远安全,为未来国家后量子密码算法标准的制定和实际部署贡献力量,对我国自行研发的优秀后量子密码算法进行优化具有重要意义.工作重点关注Aigis-enc算法在不同平台的实现优化,包含高性能平台的快速并行实现与嵌入式低功耗平台的紧凑实现.具体而言,运用单指令多数据流(single instruction multiple data,SIMD)指令,充分优化了Aigis-enc现有AVX2实现,并提供了其首个ARM Cortex-M4平台的轻量级紧凑实现.实现包含4个关键优化点:降低Montgomery约减与Barrett约减汇编指令数目,提升了约减效率;使用裁剪层数的数论变换并优化指令流水调度,加速多项式乘法运算并减少了预计算表存储需求;提供了多项式序列化与反序列化的并行汇编指令实现,加快了编码解码与加解密过程;结合on-the-fly计算与空间复用优化算法存储空间.实验结果表明:提出的优化技术在8核Intel Core i7处理器上可将Aigis-enc算法原始AVX2实现提升25%,且大幅减少了其在ARM Cortex-M4平台的预计算表存储、代码尺寸与运行堆栈占用,对算法的实际应用有重要现实意义.
The new challenges brought by the rapid development of quantum computing technology have made post-quantum cryptography(PQC)a hot research topic in the current cryptographic community.The Aigis-enc key encapsulation mechanism is a post-quantum cryptographic algorithm based on the asymmetric module learning with errors(A-MLWE)problem,which is one of the algorithms that won the first prizes of public key cryptographic algorithms in the National Cryptographic Algorithm Design Competition held by the Chinese Association for Cryptologic Research.In order to resist quantum attacks,maintain the long-term security of national cyberspace,and contribute to the development of future national PQC algorithm standards,it is important to optimize the excellent post-quantum cryptographic algorithms developed by Chinese scholars.In this paper,we focus on optimizing the Aigis-enc algorithm for different platforms,including fast parallel implementation for high-performance platforms and compact implementation for embedded low-power platforms.Specifically,we fully optimize the existing AVX2 implementation of Aigis-enc using single instruction multiple data stream(SIMD)instructions,and provide its first lightweight compact implementation for the ARM Cortex-M4 platform.Our implementation includes the following optimizations:reducing the number of assembly instructions for Montgomery and Barrett reduction to improve the efficiency of reduction;using number theoretic transformations with trimmed layers and optimized instruction pipelining to speed up polynomial multiplication and reduce the precomputed table storage;providing a parallel implementation of assembly instructions for polynomial serialization and deserialization to speed up the processes of encoding,decoding and encryption;combining on-the-fly computation and space multiplexing to optimize the algorithm storage space.The experimental results show that the proposed optimization techniques can improve the original AVX2 implementation of the Aigis-enc-768 algorithm by 25%on an 8-core Intel Core i7 processor,and significantly reduce its precomputed table storage,code size and stack usage on the ARM Cortex-M4 platform,which is of great practical importance for future deployment of the algorithm.
作者
沈诗羽
何峰
赵运磊
Shen Shiyu;He Feng;Zhao Yunlei(School of Computer Science,Fudan University,Shanghai 200433)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2021年第10期2238-2252,共15页
Journal of Computer Research and Development
基金
国家自然科学基金项目(U1536205,61472084)
国家重点研发计划项目(2017YFB0802000)
上海市科技创新行动计划项目(16DZ1100200)
上海市科学技术发展基金项目(16JC1400801)
山东省重点研发计划项目(2017CXG0701,2018CXGC0701)。
关键词
后量子密码
格密码
密钥封装机制
AVX2并行优化
嵌入式轻量级实现
post-quantum cryptography
lattice cryptography
key encapsulation mechanism
AVX2 parallel optimization
embedded lightweight implementation