基于真实数据感知的模型功能窃取攻击

Model functionality stealing attacks based on real data awareness

导出

摘要目的模型功能窃取攻击是人工智能安全领域的核心问题之一,目的是利用有限的与目标模型有关的信息训练出性能接近的克隆模型,从而实现模型的功能窃取。针对此类问题,一类经典的工作是基于生成模型的方法,这类方法利用生成器生成的图像作为查询数据,在同一查询数据下对两个模型预测结果的一致性进行约束,从而进行模型学习。然而此类方法生成器生成的数据常常是人眼不可辨识的图像,不含有任何语义信息,导致目标模型的输出缺乏有效指导性。针对上述问题,提出一种新的模型窃取攻击方法,实现对图像分类器的有效功能窃取。方法借助真实的图像数据,利用生成对抗网络(generative adversarial net,GAN)使生成器生成的数据接近真实图像,加强目标模型输出的物理意义。同时,为了提高克隆模型的性能,基于对比学习的思想,提出一种新的损失函数进行网络优化学习。结果在两个公开数据集CIFAR-10(Canadian Institute for Advanced Research-10)和SVHN(street view house numbers)的实验结果表明,本文方法能够取得良好的功能窃取效果。在CIFAR-10数据集上,相比目前较先进的方法,本文方法的窃取精度提高了5%。同时,在相同的查询代价下,本文方法能够取得更好的窃取效果,有效降低了查询目标模型的成本。结论本文提出的模型窃取攻击方法,从数据真实性的角度出发,有效提高了针对图像分类器的模型功能窃取攻击效果,在一定程度上降低了查询目标模型代价。 Objective Current model stealing attack issue is a sub-field in artificial intelligence(AI)security.It tends to steal privacy information of the target model including its structures,parameters and functionality.Our research is focused on the model functionality stealing attacks.We target a deep learning based multi-classifier model and train a clone model to replicate the functionality of the black-box target classifier.Currently,most of stealing-functionality-attacks are oriented on querying data.These methods replicate the black-box target classifier by analyzing the querying data and the response from the target model.The kind of attacks based on generative models is popular and these methods have obtained promising results in functionality stealing.However,there are two main challenges to be faced as mentioned below:first,target image classifiers are trained on real images in common.Since these methods do not use ground truth data to supervise the training phase of generative models,the generated images are distorted to noise images rather than real images.In other words,the image data used by these methods is with few sematic information,leading to that the prediction of target model is with few effective guidance for the training of the clone model.Such images restrict the effect of training the clone model.Second,to train the generative model,it is necessary to initiate multiple queries to the target classifier.A severe burden is bear on query budgets.Since the target model is a black-box model,we need to use its approximated gradient to obtain generator via zero-gradient estimation.Hence,the generator cannot obtain accurate gradient information for updating itself.Method We try to utilize the generative adversarial nets(GAN)and the contrastive learning to steal target classifier functionality.The key aspect of our research is on the basis of the GAN-based prior information extraction of ground truth images on public datasets,aiming to make the prediction from the target classifier model be with effective guidance for the training of the clone model.To make the generated images more realistic,the public datasets are introduced to supervise the training of the generator.To enhance the effectiveness of generative models,we adopt deep convolutional GAN(DCGAN)as the backbone,where the generator and discriminator are composed of convolutional layers both with non-linear activation functions.To update the generator,we illustrate the target model derived gradient information via zero-order gradient evaluation for the training of clone model.Simultaneously,we leverage the public dataset to guide the training of the GAN,aiming to make the generator obtain the information of ground truth images.In other words,the public dataset plays a role as a regularization term.Its application constrains the solution space for the generator.In this way,the generator can produce approximated ground truth images to make the prediction of the target model produce more physical information for manipulating the clone model training.To reduce the query budgets,we pre-train the GAN on public datasets to make it obtain prior information of real images before training the clone model.Our method can make the generator learn better for the training need of clone model in comparison with previous approaches of the random-initialized generator training.To expand the objective function of training clone model,we introduce contrastive learning to the model stealing attacks area.Traditional model functionality stealing attack methods train the clone model only by maximizing the similarity of predictions from two models to one image.Here,we use the contrastive learning manner to consider the diversity of predictions from two models to different images.The positive pair consists of the predictions from two models to one image and the negative pair is made up with the predictions from two models to two different images.To measure the diversity of two predictions,we attempt to use cosine similarity to represent the similarity of two predictions.Then,we use the InfoNCE loss function to achieve the similarity maximization of positive pairs and diversity maximization of negative pairs at the same time.Result To demonstrate the performances of our methods,we carry out model functionality stealing attacks on two different black-box target classifiers.The two classifiers of Canadian Insititute for Advanced Research-10(CIFAR-10)and street view house numbers(SVHN)are presented.Each of model structure is based on ResNet-34 and the structures of clone models are based on resnet-18 both.The used public datasets are not be overlapped with the training datasets of target classifiers.We test them on CIFAR-10 and SVHN test datasets following our trained clone models.The accuracy results of these clone models are 92.3%and 91.8%of each.Normalized clone accuracy is achieved 0.97×and 0.98×of each.Specially,our result can achieve 5%improvements for the CIFAR-10 target model in terms of normalized clone accuracy over the data-free model extraction(DFME).Our method achieves promising results for reducing querying budgets as well.To make the accuracy of clone model reach 85%on the CIFAR-10 test datasets,DFME is required to spend 8.6 M budgets.But,our method spends 5.8 M budgets only,which is 2.8 M smaller than DFME.Our method is required to spend 9.4 M budgets for reaching the 90%accuracy,which is not half enough to the DFME of 20 M budgets.These results demonstrate that our method improve the performances of functionality stealing attack methods based on generative models.It is beneficial for reducing the query budgets as well.Conclusion We propose a novel model functionality stealing attack method,which trains the clone model guided by prior information of ground truth images and the contrastive learning manner.The experimental results show that our optimized model has its potentials and the querying budgets can be reduced effectively.

作者李延铭李长升余佳奇袁野王国仁 Li Yanming;Li Changsheng;Yu Jiaqi;Yuan Ye;Wang Guoren(School of Computer Science,Beijing Institute of Technology,Beijing 100081,China)

机构地区北京理工大学计算机学院

出处《中国图象图形学报》 CSCD 北大核心 2022年第9期2721-2732,共12页 Journal of Image and Graphics

关键词模型功能窃取生成模型对比学习对抗攻击人工智能安全 model functionality stealing generative model contrastive learning adversarial attack artificial intelligencesecurity

分类号 TP389.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1黎英,宋佩华.迁移学习在医学图像分类中的研究进展[J].中国图象图形学报,2022,27(3):672-686. 被引量：19
2何沛松,李伟创,张婧媛,王宏霞,蒋兴浩.面向GAN生成图像的被动取证及反取证技术综述[J].中国图象图形学报,2022,27(1):88-110. 被引量：9

二级参考文献18

1杨锐,骆伟祺,黄继武.多媒体取证[J].中国科学：信息科学,2013,43(12):1654-1672. 被引量：24
2张晓琳,方针,张新鹏.利用通道间相关性的CFA图像盲取证[J].应用科学学报,2015,33(1):87-94. 被引量：5
3程波,朱丙丽,熊江.基于多模态多标记迁移学习的早期阿尔茨海默病诊断[J].计算机应用,2016,36(8):2282-2286. 被引量：6
4王坤峰,苟超,段艳杰,林懿伦,郑心湖,王飞跃.生成式对抗网络GAN的研究进展与展望[J].自动化学报,2017,43(3):321-332. 被引量：322
5陈诗慧,刘维湘,秦璟,陈亮亮,宾果,周煜翔,汪天富,黄炳升.基于深度学习和医学图像的癌症计算机辅助诊断研究进展[J].生物医学工程学杂志,2017,34(2):314-319. 被引量：54
6郑光远,刘峡壁,韩光辉.医学影像计算机辅助检测与诊断系统综述[J].软件学报,2018,29(5):1471-1514. 被引量：72
7褚晶辉,吴泽蕤,吕卫,李喆.基于迁移学习和深度卷积神经网络的乳腺肿瘤诊断系统[J].激光与光电子学进展,2018,55(8):196-202. 被引量：25
8金祝新,秦飞巍,方美娥.深度迁移学习辅助的阿尔兹海默氏症早期诊断[J].计算机应用与软件,2019,36(5):171-177. 被引量：6
9俞益洲,石德君,马杰超,周振.人工智能在医学影像分析中的应用进展[J].中国医学影像技术,2019,35(12):1808-1812. 被引量：40
10张驰名,王庆凤,刘志勤,黄俊,周莹,刘启榆,徐卫云.基于深度迁移学习的肺结节辅助诊断方法[J].计算机工程,2020,46(1):271-278. 被引量：27

共引文献26

1管宽岐,蔺雨桐,赵雨薇,秦列列,张楠楠,曹英丽.基于深度学习的航拍光伏板红外图像热斑检测方法研究[J].电子测量技术,2022,45(22):75-81. 被引量：3
2邓辉,张洁.基于改进的ResNet50网络的黑色素瘤分类方法[J].计算机技术与发展,2023,33(2):64-70.
3林培,许杨剑,傅军平,陈栋栋,鞠晓喆,梁利华.基于1D-DCGAN和1D-CAE的小样本轴承故障跨域诊断方法[J].机电工程,2023,40(3):326-334. 被引量：4
4李颖,边山,王春桃,卢伟.CNN结合Transformer的深度伪造高效检测[J].中国图象图形学报,2023,28(3):804-819. 被引量：7
5李锵,王旭,关欣.一种结合三重注意力机制的双路径网络胸片疾病分类方法[J].电子与信息学报,2023,45(4):1412-1425. 被引量：4
6苏杭,刘佳蕙.高分遥感影像建筑物边缘提取模型迁移性评估对比[J].无线互联科技,2023,20(6):106-110.
7王金伟,曾可慧,张家伟,罗向阳,马宾.基于空频联合卷积神经网络的GAN生成人脸检测[J].计算机科学,2023,50(6):216-224. 被引量：3
8段文玉,何越,杜钦红,杜钰堃,西永明,杨环.基于胶囊网络的青少年特发性脊柱侧弯智能分型研究[J].青岛大学学报（自然科学版）,2023,36(2):43-49.
9赖晓倩,余镒琦,梁中耀,陈火荣,陈能汪.基于差分回归模型和可迁移长短期记忆网络集成的三沙湾水温预测[J].海洋学报,2023,45(4):165-178.
10刘敏,何智子,林坤,胡兰兰,曾春艳.非局部卷积残差学习模型的病理图像分类方法[J].光电子．激光,2023,34(6):663-672. 被引量：1

1陈慧.Context建模技术的基因组压缩算法的研究[J].中国科技信息,2021(20):81-82.
2许渊冲(译).Fun House(趣味屋)[J].小学生导读,2022(9):27-27.
3赵坤,张海军,田骞.基于倾斜航空摄影技术的三维实景建模研究[J].能源与环保,2022,44(9):143-149. 被引量：3
4吕波.基于大数据分析的海量图像快速分类研究[J].自动化技术与应用,2022,41(10):85-88. 被引量：3
5张真翊,张宏.短肠综合征治疗方案中粪菌移植潜在价值分析与展望[J].中国综合临床,2022,38(4):382-384.
6吴黄玲.北极星分级绘本融合课的同课异构分析浅思[J].今天,2022(17):183-184.
7Siqi Xu,Xin He.The Impact of COVID-19 on Informal Employment and the Measures Taken by the Chinese Government:The Analysis of Street Vending Economy in Nanjing[J].Journal of Economic Science Research,2022,5(2):13-17.
8王成军,李家宝.基于TRIZ的冷凝器清洗机器人创新设计[J].包装工程,2022,43(13):158-164. 被引量：8
9Thi Bich Thuy Nguyen.A Remark on Polynomial Mappings from C<sup>n</sup> to C<sup>n-1</sup> and an Application of the Software Maple in Research[J].Applied Mathematics,2016,7(15):1868-1881.
10许馨尹,刘梦杰,李涛,付保川.基于日光估计的动态照明控制方法研究[J].建筑科学,2022,38(8):184-193. 被引量：8

中国图象图形学报

2022年第9期

浏览历史

内容加载中请稍等...

基于真实数据感知的模型功能窃取攻击

参考文献2

二级参考文献18

共引文献26

相关作者

相关机构

相关主题

浏览历史