针对视频语义描述模型的稀疏对抗样本攻击

Sparse Adversarial Examples Attacking on Video Captioning Model

下载PDF

导出

摘要在多模态深度学习领域,尽管有很多研究表明图像语义描述模型容易受到对抗样本的攻击,但是视频语义描述模型的鲁棒性并没有得到很多的关注。主要原因有两点:一是与图像语义描述模型相比,视频语义描述模型的输入是一个图像流,而不是单一的图像,如果对视频的每一帧进行扰动,那么整体的计算量将会很大;二是与视频识别模型相比,视频语义描述模型的输出不是一个单词,而是更复杂的语义描述。为了解决上述问题以及研究视频描述模型的鲁棒性,提出了一种针对视频语义描述模型的稀疏对抗样本攻击方法。首先,基于图像识别领域的显著性分析的原理,提出了一种评估视频中不同帧对模型输出贡献度的方法。在此基础上,选择关键帧施加扰动。其次,针对视频语义描述模型,设计了基于L2范数的优化目标函数。在数据集MSR-VTT上的实验结果表明,所提方法在定向攻击上的成功率为96.4%,相比随机选择视频帧,查询次数减少了45%以上。上述结果验证了所提方法的有效性并揭示了视频语义描述模型的脆弱性。 Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand,the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand,compared with the video recognition model,the output of the model is not a single word,but a more complex semantic description.To solve the above problems and study the robustness of video captioning model,this paper proposes a sparse adversarial attack method.Firstly,a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L2norm based optimistic objective function suited for video caption models is designed.With a high success rate of 96.4%for the targeted attack and a reduction in queries of more than 45%compared to randomly selecting video frames,the evaluation on the MSR-VTT dataset demonstrates the effectiveness of our strategy as well as reveals the vulnerability of the video caption model.

作者邱江兴汤学明王天美王成崔永泉骆婷 QIU Jiangxing;TANG Xueming;WANG Tianmei;WANG Chen;CUI Yongquan;LUO Ting(Hubei Key Laboratory of Distributed System Security,Hubei Engineering Research Center on Big Data Security,School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区分布式系统安全湖北省重点实验室

出处《计算机科学》 CSCD 北大核心 2023年第12期330-336,共7页 Computer Science

关键词多模态模型视频语义描述模型对抗样本攻击图像显著性关键帧选择 Multi-model Video caption Adversarial example Saliency map Keyframe select

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1张新峰,范铭,曹哲宇,杨文强,于海洋,张海兵,李斌.基于几何统计的人体姿态语义描述方法[J].电子测量与仪器学报,2023,37(8):52-59.
2潘崇霞,周玮,李立望,谢吉刚.定向攻击下企业信息安全投资策略研究——基于演化博弈模型[J].企业经济,2023,42(11):40-49. 被引量：1
3张晨,朝乐门,靳庆文.机器可理解的数据故事语义描述及推理方法研究[J].图书情报工作,2023,67(20):142-150.
4张正豪,陈家军,黄知涛,王翔,柯达.通信信号智能调制识别对抗攻击研究[J].航天电子对抗,2023,39(5):11-16.
5王楠芳,王勇,周林.改进频率二分法的对抗性攻击方法[J].国外电子测量技术,2023,42(9):28-36.
6任诗曼,朱军,方铮,李闯农,梁策,谢亚坤,李维炼,张天奕.联合多尺度注意力机制与边缘约束的SPOT7影像林地提取方法[J].武汉大学学报(信息科学版),2023,48(12):1951-1958.
7Entao Li,Xiaoping Guo,Dongxiang Hong,Qizan Gong,Wenyu Xie,Tingting Li,Jian Wang,Xia Chuai,Sandra Chiu.Duration of humoral immunity from smallpox vaccination and its cross-reaction with Mpox virus[J].Signal Transduction and Targeted Therapy,2023,8(10):4942-4948. 被引量：1
8毛琳,高航,杨大伟.视频描述中全局-局部联合语义生成网络[J].计算机辅助设计与图形学学报,2023,35(9):1374-1382.
9元皓,曹斌,游康东,董自强,张统一,彭巨擘,蔡珊珊,罗晓斌,刘晨,王加俊.机器学习辅助的无铅焊料合金力学性能预测及合金设计[J].中国科学：技术科学,2023,53(11):1962-1974.
10庞玉婷,赵东霞.星形明渠网络系统的PDP反馈控制和指数镇定[J].数学物理学报（A辑）,2023,43(6):1803-1813.

计算机科学

2023年第12期

浏览历史

内容加载中请稍等...

针对视频语义描述模型的稀疏对抗样本攻击

相关作者

相关机构

相关主题

浏览历史