摘要
全同态加密算法支持直接对加密数据(密文)执行代数运算,但其密文评估中的数论变换(NTT)涉及大量高维度整系数多项式环运算,限制了其在隐私计算中的应用。针对CPU实现方案对NTT算法计算并行度较低的问题,提出一种CPU+GPU异构的CKKS全同态加密实现方案。首先,根据NTT算法数据内存访问规律,设计一种数据暂存共享内存策略,有效减少频繁的全局内存访问。其次,针对数据规模可变导致内核出现部分空闲线程的问题,设计线程工作负载动态分配机制,并采用不同基数的蝴蝶变换结构,提高数据输入的灵活性并优化并行策略。再次,提出单—多内核混合调用模式,通过NTT算法蝶形变换分组大小动态切换内核调用模式,充分利用GPU多核调用的并行潜力。最后,设计并实现并行程度更高、计算复杂度更低的NTT算法,利用该算法实现并行的同态乘法运算,并基于HElib库实现CPU+GPU异构的CKKS全同态加密算法。实验结果表明,与使用AVX-512加速的HElib库相比,所提的NTT/INTT计算时间减短近65%。
Fully homomorphic encryption supports direct algebraic operations on encrypted data(ciphertext),with the foundation of its ciphertext evaluation phase involving numerous high-dimensional integer coefficient polynomial ring additions and multiplications.This limits its widespread application in the field of privacy computing.The CPU implementation scheme offers low parallelism for the Number Theoretic Transform(NTT)algorithm calculations.A CPU+GPU heterogeneous fully homomorphic encryption im-plementation scheme was proposed.Firstly,a cache strategy of data temporarily stored in shared memory was introduced,which stored repeatedly read and unchanging data,including NTT input data and rotation factors,in shared memory to reduce frequent glob-al memory access.Secondly,to address the issue of partially idle threads caused by variable data sizes,it dynamically allocates thread workloads based on data size and hardware resources,adopting butterfly transformation structures of different radices to achieve opti-mal parallel strategies while enhancing the flexibility of data input.Thirdly,it introduces a single-multi-core mixed invocation mode,dynamically switching the kernel invocation mode based on the group size of butterfly transformations in each NTT iteration,to fully utilize the parallel potential of multi-core invocations on GPU.Finally,it designs and implements a higher parallelism,lower computa-tional complexity NTT algorithm for GPU,uses this algorithm to perform parallel homomorphic multiplication operations,and imple-ments a CPU+GPU heterogeneous CKKS fully homomorphic encryption algorithm based on the HElib library.Experimental results show NTT/INTT computation time is reduced by nearly 65%compared to HElib library using AVX-512 acceleration technology.
作者
谭泽玖
赵鑫
万俊平
刘虎成
蒋琳
徐金明
纪守领
王轩
TAN Zejiu;ZHAO Xin;WAN Junping;LIU Hucheng;JIANG Lin;XU jinming;JI Shouling;WANG Xuan(School of computer science and technology,Harbin Institute of Technology,Shenzhen,Shenzhen,518055,China;College of control science and engineering,Zhejiang University,Hangzhou,310058,China;College of Computer Science and Technology,Zhejiang University,Hangzhou,310058,China;Pengcheng National Laboratory,Shenzhen 518000,China;Guangdong Key Laboratory of New Security and Intelligence Technology,Shenzhen 518005,China)
基金
国家重点研发计划项目(2022YFB3102100)。
关键词
隐私计算
全同态加密
GPU并行加速
数论变换
缓存策略
卷积神经网络
Privacy computing
Fully homomorphic encryption
GPU acceleration
Number theoretic transform
Caching strategy
Convolutional neural networks