摘要
如今,大数据与人工智能技术发展迅猛,基于海量数据进行精确训练的机器学习模型及其应用推动了生产力的提升,但同时也带来了严重的数据安全与隐私泄露问题,这一问题促进了隐私保护机器学习的研究.在实际应用中,机器学习算法常常在稀疏数据集上进行运算,明文下的模型训练存在高效计算方法,可以充分利用数据稀疏性,提高计算效率.为了保护数据隐私而引入的密码技术,将稀疏数据转化为稠密数据,从而使高效的稀疏数据运算变得复杂.现有的对于安全稀疏数据计算的相关研究都涉及大量公钥密码操作,计算效率不高,并且主要考虑两方的场景.实际上,稀疏数据的计算可简化为非零位置上相应元素的计算.为了充分利用这一特性以提高效率,本文将稀疏向量乘法问题分为了过滤和乘法计算两个模块来处理,并在三方联合计算的场景下进行协议设计.首先,基于三方加法复制秘密分享以及伪随机置换技术构建过滤协议,该协议能够实现对向量元素的过滤,筛选出向量中非零位置对应的元素.随后,在过滤协议的基础上引入加法同态加密技术,对非零元素进行安全乘法计算,实现一个隐私保护的安全三方稀疏向量乘法协议,并在半诚实敌手模型下,使用理想现实模拟范式证明了协议的安全性.最后,将隐私保护稀疏向量乘法协议应用到逻辑回归模型中,验证了其可用性.通过实验以及效率分析表明,相对于隐私保护稀疏矩阵乘法协议CAESAR,本文所提出的协议将主要计算开销由O(n)的密文运算次数,降低为O(m)次,其中n是向量的维数,m是向量中非零元素数量;在小批量的逻辑回归模型训练中,文本协议与通用安全多方计算框架ABY3相比有10%~30%的效率提升.
Nowadays,with the rapid development of big data and artificial intelligence(AI)tech-nology,machine learning models trained precisely on vast amounts of data and their applications have propelled an increase in productivity.However,such model training often necessitates collaboration among multiple data holders,which poses serious risks to data security and privacy,including the potential for data breaches.With the increasing awareness of private data privacy protection among the public,such collaboration encounters significant obstacles.Addressing the imperative need to establish connections among data islands while prioritizing privacy becomes a pressing concern.This challenge has spurred extensive research and development in the domain of privacy-preserving machine learning.In practical application scenarios,machine learning algorithms typically perform operations on sparse datasets.This implies that the majority of elements in the data matrix are zeros,with a relatively small number of non-zero elements actively contributing to the calculation process.Taking advantage of this data sparsity during model training reduces the data processing load,thereby enhancing computational efficiency.Cryptographic techniques have been introduced to protect data privacy.While generic secret sharing techniques address privacy concerns,the process often converts sparse data into denser forms,complicating the initially efficient computations.The existing study on secure sparse data computation involves extensive use of public-key cryptographic operations,resulting in low computational efficiency.Also,they primarily focus on scenarios involving only two parties.In fact,the computation of sparse data can be simplified to the calculations of corresponding elements at their non-zero positions.To fully exploit this characteristic for efficiency gains,we divide the problem of sparse vector multiplication into two modules:filtering and multiplication.Consequently,we propose a privacy-preserving multi-plication protocol for sparse vectors in a scenario involving three participants.Firstly,a filtering protocol is constructed based on the techniques of three-party additive replication secret sharing and pseudo-random permutation.Given specific filtering rules and the vectors to be processed,this protocol can filter the vectors according to the rules,sieving out the required vector elements.Therefore,filtering rules can be formulated based on the sparsity characteristics of vectors to sieve out elements at non-zero positions.Subsequently,we just need to perform multi-plication operations on these selected elements to complete the multiplication calculations for sparse vectors.By introducing additive homomorphic encryption technology,a privacy-preserving three-party sparse vector multiplication protocol is efficiently implemented based on the filtering protocol.We have proven the security of the protocol using the ideal/real simulation technology under the semi-honest adversary model,which ensures that apart from the output,no other private information is leaked throughout the entire process.Finally,the privacy-preserving sparse vector multiplication protocol is applied to the logistic regression model to verify its feasibility.The experiments and efficiency analysis show that,compared with the privacy-preserving sparse matrix multiplication protocol CAESAR,the proposed protocol in this paper reduces the main computing overhead from O(n)ciphertext operations to O(m)times,where n is the dimension of the vector,and m is the number of non-zero elements in the vector.In mini-batch logistic regression model training,our protocol demonstrates an efficiency improvement of 10%-30%compared to the general secure multi-party computation framework ABY3.
作者
周丹钰
阎允雪
张建栋
蒋瀚
徐秋亮
ZHOU Dan-Yu;YAN Yun-Xue;ZHANG Jian-Dong;JIANG Han;XU Qiu-Liang(School of Software,Shandong University,Jinan 250101;Key Laboratory of Softuware Engineering of Shandong Provuince(Shandong University),Jinan 250101)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2024年第5期1179-1193,共15页
Chinese Journal of Computers
基金
国家自然科学基金项目(62172258)
山东省软件工程重点实验室科技创新基地专项(11480004042015)
山东省科技型中小企业创新能力提升工程项目(2022TSGC1086)资助.
关键词
安全多方计算
隐私保护机器学习
秘密分享
稀疏向量乘法
隐私计算
secure multi-party computation
privacy-preserving machine learning
secret sharing
sparse vector multiplication
privacy computing