多关键帧特征交互的人脸篡改视频检测被引量：9

Deepfake video detection with feature interaction amongst key frames

导出

摘要目的深度伪造是新兴的一种使用深度学习手段对图像和视频进行篡改的技术,其中针对人脸视频进行的篡改对社会和个人有着巨大的威胁。目前,利用时序或多帧信息的检测方法仍处于初级研究阶段,同时现有工作往往忽视了从视频中提取帧的方式对检测的意义和效率的问题。针对人脸交换篡改视频提出了一个在多个关键帧中进行帧上特征提取与帧间交互的高效检测框架。方法从视频流直接提取一定数量的关键帧,避免了帧间解码的过程;使用卷积神经网络将样本中单帧人脸图像映射到统一的特征空间;利用多层基于自注意力机制的编码单元与线性和非线性的变换,使得每帧特征能够聚合其他帧的信息进行学习与更新,并提取篡改帧图像在特征空间中的异常信息;使用额外的指示器聚合全局信息,作出最终的检测判决。结果所提框架在FaceForensics++的3个人脸交换数据集上的检测准确率均达到96.79%以上;在Celeb-DF数据集的识别准确率达到了99.61%。在检测耗时上的对比实验也证实了使用关键帧作为样本对检测效率的提升以及本文所提检测框架的高效性。结论本文所提出的针对人脸交换篡改视频的检测框架通过提取关键帧减少视频级检测中的计算成本和时间消耗,使用卷积神经网络将每帧的人脸图像映射到特征空间,并利用基于自注意力的帧间交互学习机制,使得每帧特征之间可以相互关注,学习到有判别性的信息,使得检测结果更加准确,整体检测过程更高效。 Objective Images and videos manipulation is becoming more easy-use and indistinguishable with development of deep learning.Deepfake is a sort of face manipulation technique which poses a great threat to social security and individual rights.Researchers have been working to propose various detection models or frameworks,which can be divided into three categories combined with their inputs factors like frame level,clip level and video level,respectively.Detection models of frame level have focused on single frame and ignore temporal information only,potentially leading to low confidence in videos detection.Although detection models of clip level make use of a sequence of frames simultaneously,the length of sequence is relatively shorter than the real length of a video.Thus,a clip cannot well represent a video.Moreover,video clips are fragmented and may have adverse effect on video level detection.The consecutive frames in a short clip have little difference and cause redundant information,which may cut the detection performance.The video level detection methods use frames of large interval as input and capture more key features to represent qualified video.The existing methods ignore the impact of sample extraction procedure and its expensive computation of decoding video stream.To solve this problem and provide more efficient detection method on face-swap manipulation videos,a detection framework based on the interaction of key frames’features is illustrated.Method The proposed detection framework has consisted of two parts:key frames extraction in context of face region images extraction and the detection model.First,an amount of key frames from the video stream have been extracted and checked.Inter-frame decoding is avoided and computation time is deducted via key frames extraction.Next,multitask cascaded convolutional neural networks(MTCNN)is applied to locate the position of face region on the extracted frames.Face images are cropped with 80 margins from them.MTCNN is re-applied to the images extracted before.Compact face images are extracted from them.The face images input are mapped into high dimensional embedding space by Inception-ResNet-V1.This convolution neural network is initialized by pre-trained parameters in face recognition task and updated end-to-end implementation.At last,these features of key frames are melted into an interaction learning module,which contains various self-attention-based encoders.In this module,each key frame feature can learn from every other key frame and update itself.Distinctive abnormal features of manipulated images are extracted via part of linear and non-linear transformations.A global classification vector is concatenated at the first of key frame features,updating along with them,and makes the final decision.Result The detection framework has been evaluated on five mainstream datasets listed below:Deepfakes,FaceSwap,FaceShifter,DeepFakeDetection and Celeb-DF,respectively.The three datasets of Deepfakes,FaceSwap,FaceShifter are from FaceForensics++.It achieves accuracies of 97.50%,97.14%,96.79%,97.09%and 98.64%,respectively,with a small quantity of key frames.Original 3D convolution models and LSTM-based models are compared with the illustrated detection model on Celeb-DF in terms of 16 key frames as input.A demonstrated lightweight 3D model(L3D)for deepfake detection has been tested as well.As the samples size is smaller than that of exisited work,R3D,C3D,I3D and L3D have demonstrated poor detection performance while LSTM-based one achieves an accuracy of 98.06%.The demonstrated model is much better than before(99.61%).In the condition that the input is changed to consecutive frames,the proposed model has shown qualified performance 98.64%as well.The time cost of detection is evaluated and illustrated that our framework can detect a video in an average time of 3.17 s,less than major models or with consecutive frames as input.The research strategy of key frame extraction and the framework proposed are shown to be efficient based on the experiments results.A realistic scene has been considered,in which key frames quantity of the video has been checked.A little more frames than training can achieve higher accuracy as the detection model has learned the relation well amongst frames and can be generalized well,but fewer frames can also lead to insufficient information and worse performance.In general,the proposed model can achieve good and stable detection performance,training with 16 key frames.Conclusion An efficient detection framework for face-swap manipulation videos has been demonstrated.It takes the advantage of key frame extraction that it skips the procedure of inter-frame decoding and get time cutting in the preprocessing step.Based on face region images being cropped from valid key frames’pictures,Inception-ResNet-V1 maps them to a standardized embedding space followed by several layers of self-attention based encoders and linear or non-linear transformations.More meaningful and distinguishing information is captured when every frame feature can learn from each other.The experiments on Celeb-DF dataset demonstrate that the illustrated model outperforms other sequential model and 3D convolution neural networks.The time cost is relatively deducted and the effiency of the proposed framework is improved.

作者祝恺蔓徐文博卢伟赵险峰 Zhu Kaiman;Xu Wenbo;Lu Wei;Zhao Xianfeng(School of Computer Science and Engineering,Sun Yat-sen University,GuangZhou 510006,China;State Key Laboratory of Information Security,Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100195,China;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100195,China)

机构地区中山大学计算机学院中国科学院信息工程研究所信息安全国家重点实验室中国科学院大学网络空间安全学院

出处《中国图象图形学报》 CSCD 北大核心 2022年第1期188-202,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(U2001202,62072480)。

关键词 Deepfake检测人脸交换篡改视频关键帧层级结构多帧交互自注意力机制 Deepfake detection face-swap manipulation videos key frames hierarchical structure multi-frame interaction self-attention mechanism

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1赖玥聪,黄添强,蒋仁祥.采用指数矩的图像区域复制粘贴篡改检测[J].中国图象图形学报,2015,20(9):1212-1221. 被引量：5
2李旭嵘,于鲲.一种基于双流网络的Deepfakes检测技术[J].信息安全学报,2020,5(2):84-91. 被引量：9
3张怡暄,李根,曹纭,赵险峰.基于帧间差异的人脸篡改视频检测方法[J].信息安全学报,2020,5(2):49-72. 被引量：12
4赵洁,郭继昌,张艳,张众维.JPEG图像双重压缩偏移量估计的篡改区域自动检测定位[J].中国图象图形学报,2015,20(10):1304-1312. 被引量：7

二级参考文献36

1骆伟祺,黄继武,丘国平.鲁棒的区域复制图像篡改检测技术[J].计算机学报,2007,30(11):1998-2007. 被引量：65
2Bayram S, Sencar H T, Memon N. A survey of copy-move forgery detection techniques[C]//Proceedings of IEEE Western New York Image Processing Workshop. Rochester, NY, USA: IEEE,2008:1-5.
3Bayram S, Sencar H T, Memon N. An efficient and robust method for detecting copy-move forgery[C] //Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Washington DC, USA: IEEE,2009: 1053-1056.[DOI:10.1109/I CASSP.2009.4959768].
4Xu B, Wang J W, Liu G J, et al. Image copy-move forgery detection based on SURF[C]//Proceedings of the International Conference on Multimedia Information Networking and Security. Nanjing, China: IEEE,2010:889-892.[DOI:10.1109/MINES. 2010.189].
5Hu H T, Qiao L. Radical harmonic Fourier moments[C]//Proceedings of International Conference on Intelligent Computation Technology and Automation.Washington DC, USA: IEEE, 2011: 468-471.[DOI:10.1109/ICICTA.2011.130].
6Ping Z L,Jiang Y J. Complex exponent moments FFT algorithm and its application[C]//Proceedings of the 4th International Conference on Agents and Artificial Intelligence. Algarve, Portugal: SciTePress,2012: 465-468.[DOI:10.5220/000370090465046].
7Ping Z L,Jiang Y J. FFT algorithm of complex exponent moments and its application in image recognition[C]//Proceedings of the 6th International Conference on Digital Image Processing. Athens, Greece: SPIE, 2011: 468-471. [DOI:10.1117/12. 2064619].
8Li Y N. Image copy-move forgery detection based on polar cosine transform and approximate nearest neighbor searching[J]. Forensic Science International, 2013, 224(1): 59-67.[DOI:10. 1016/j.forsciint.201 2.10.031].
9冈萨雷斯, 伍兹, 埃丁斯. 数字图像处理: MAT- LA版[M]. 北京:电子工业出版社, 2005:255-256.
10UCID. Uncompressed Colour Image Dataset [DB/OL].[2015-01-14].http://homepages.lboro.ac.uk/cogs/datasets/ucid/ucid.html.

共引文献28

1李颖,边山,王春桃,黄琼.基于双流网络结构的深度伪造人脸的检测方法[J].计算机科学,2022,49(S02):558-566. 被引量：2
2李淑芝,李躲,邓小鸿,胡琴.自适应分类的篡改定位和恢复水印算法[J].小型微型计算机系统,2017,38(11):2437-2442. 被引量：2
3魏佳.数字图像区域复制粘贴特征点盲检测方法仿真[J].计算机仿真,2017,34(12):260-263. 被引量：3
4黄维,黄添强,张雪莉,肖辉.基于块效应网格偏移的重获取JPEG图像篡改检测[J].网络与信息安全学报,2017,3(12):24-30. 被引量：6
5李小琛,黄添强.融合彩色信息与SIFT特征的帧内复制粘贴篡改检测[J].计算机系统应用,2018,27(7):11-18. 被引量：3
6王志锋,朱琳,曾春艳,闵秋莎,夏丹.数字图像重压缩检测研究综述[J].计算机科学,2018,45(9):20-29. 被引量：4
7李靖,陈怀民,段晓军,刘慧英,任悦.基于沃尔什变换的图像不变正交矩[J].哈尔滨工程大学学报,2019,40(10):1784-1789.
8郑佳雯,张威虎.低频快速切比雪夫矩的篡改图像检测算法[J].计算机系统应用,2020,29(3):194-199. 被引量：2
9李旭嵘,纪守领,吴春明,刘振广,邓水光,程鹏,杨珉,孔祥维.深度伪造与检测技术综述[J].软件学报,2021,32(2):496-518. 被引量：32
10暴雨轩,芦天亮,杜彦辉,石达.基于i_ResNet34模型和数据增强的深度伪造视频检测方法[J].计算机科学,2021,48(7):77-85. 被引量：9

同被引文献51

1王欢,吴成东,迟剑宁,于晓升,胡倩.联合多任务学习的人脸超分辨率重建[J].中国图象图形学报,2020,25(2):229-240. 被引量：6
2罗向阳,王道顺,汪萍,刘粉林.基于图像多域特征缩放与BP网络的信息隐藏盲检测[J].东南大学学报（自然科学版）,2007,37(A01):87-91. 被引量：3
3张剑,何骅,詹小四,肖俊.结合特征适配与拉普拉斯形变的3维人脸重建[J].中国图象图形学报,2014,19(9):1349-1359. 被引量：5
4王顺江,孙乔,侯验秋,林济铿.基于状态估计及综合可疑度的参数辨识和修正方法[J].中国电力,2020,53(2):36-42. 被引量：7
5丁恩杰,刘忠育,刘亚峰,郁万里.基于多维度和多模态信息的视频描述方法[J].通信学报,2020,41(2):36-43. 被引量：8
6张航,卢小平,张晓强,路泽忠.面向矿山监管的无人机视频关键帧影像动态提取方法[J].遥感信息,2020,35(1):112-116. 被引量：11
7刘仁峰,黄诗瑶,聂勇鹏,徐胜勇.油菜角果数量及关键表型参数的自动化检测方法研究[J].中国油料作物学报,2020,42(1):71-77. 被引量：5
8邓彬,成卫青.基于改进慢启动算法的大文件快速传输[J].计算机应用研究,2020,37(3):860-863. 被引量：7
9梁瑞刚,吕培卓,赵月,陈鹏,邢豪,张颖君,韩冀中,赫然,赵险峰,李明,陈恺.视听觉深度伪造检测技术研究综述[J].信息安全学报,2020,5(2):1-17. 被引量：28
10张怡暄,李根,曹纭,赵险峰.基于帧间差异的人脸篡改视频检测方法[J].信息安全学报,2020,5(2):49-72. 被引量：12

引证文献9

1辛经纬,魏子凯,王楠楠,李洁,高新波.面向非受控场景的人脸图像正面化重建[J].中国图象图形学报,2022,27(9):2788-2800.
2王涛,许锟.基于多级频域分解与伪造挖掘的深度伪造检测方法[J].兰州工业学院学报,2022,29(6):78-82.
3张萌萌,汪可馨.高频特征与全局交互的人脸伪造检测[J].佳木斯大学学报（自然科学版）,2023,41(1):34-37. 被引量：1
4李颖,边山,王春桃,卢伟.CNN结合Transformer的深度伪造高效检测[J].中国图象图形学报,2023,28(3):804-819. 被引量：7
5卓文琦,李东泽,王伟,董晶.面向轻量级深度伪造检测的无数据模型压缩[J].中国图象图形学报,2023,28(3):820-835. 被引量：1
6戴昀书,费建伟,夏志华,刘家男,翁健.局部相似度异常的强泛化性伪造人脸检测[J].中国图象图形学报,2023,28(11):3453-3470. 被引量：1
7丁峰,匡仁盛,周越,孙珑,朱小刚,朱国普.深度伪造及其取证技术综述[J].中国图象图形学报,2024,29(2):295-317.
8陈烽,杨怀.边缘异常识别下视频图像篡改细节检测[J].计算机仿真,2024,41(2):192-195.
9杨盼盼,马凌飞,平阳,索雅丽.移动AR+VR支持下跨媒体视频关键帧还原仿真[J].微型电脑应用,2024,40(3):32-36.

二级引证文献9

1王清波,李振,王一诺.人脸伪造主动防御方法的设计与实现[J].自动化应用,2023,64(20):172-175.
2丁峰,匡仁盛,周越,孙珑,朱小刚,朱国普.深度伪造及其取证技术综述[J].中国图象图形学报,2024,29(2):295-317.
3瞿左珉,殷琪林,盛紫琦,吴俊彦,张博林,余尚戎,卢伟.人脸深度伪造主动防御技术综述[J].中国图象图形学报,2024,29(2):318-342.
4邢建好,田秀霞,韩奕.结合金字塔Transformer与浅层CNN的变电站图像篡改检测[J].中国图象图形学报,2024,29(2):444-456.
5魏天东,卓嘎,冯凤阳,李增强.基于EVIT的伪视频检测系统[J].电脑编程技巧与维护,2024(3):154-157.
6张航川,胡彩平,胡勇.深度人脸伪造视频检测研究综述[J].金陵科技学院学报,2023,39(4):32-41.
7张梦萍,牟熠,曾敏,肖尧馨.深度学习视域下生成式伪造手写数字鉴定方法[J].信息技术与信息化,2024(8):199-201.
8宋伟泽.基于智能机器人的医院药房系统应用[J].智能计算机与应用,2024,14(9):140-144.
9郑盛有,陈雁翔,赵祖兴,刘海洋.多模态部分伪造数据集的构建与基准检测[J].计算机应用,2024,44(10):3134-3140.

1李龙飞(翻译).用于图像自动处理的云数字显微镜平台[J].现代材料动态,2019,0(6):7-7.
2车静.核心素养导向下高中数学建模思想的培养[J].学周刊,2022(6):113-114. 被引量：3
3贾珺,冯春燕,夏海轮,张天魁,李成钢.基于样本均衡与特征交互的通信网络故障预测方法[J].北京邮电大学学报,2021,44(6):59-66. 被引量：4
4刘星.高中物理教学中如何培养学生科学思维能力[J].课程教育研究,2021(29):121-122.
5沈燕.引领高中生英语自主学习的教学策略探析[J].新课程研究,2021(35):93-94.
6何亚东,袁壮,林扬,高新江,李传坤,王春利.基于极深因子分解机的化工过程故障诊断方法[J].过程工程学报,2022,22(1):135-144. 被引量：7
7赵越,武志昊,赵苡积.基于特征与域感知的点击率预估方法[J].计算机工程,2022,48(3):60-68. 被引量：1
8张芳.基于网络学习空间的中学古诗教学策略[J].教学管理与教育研究,2021,6(18):24-25.
9叶洲铭,王瑛,王勇.基于多任务学习注意力交互模型的方面级情感分析[J].计算机科学与应用,2022,12(1):10-16.
10张婷婷.互动式教学在高中语文教学中的运用[J].新作文（教研）,2022(1):0251-0252.

中国图象图形学报

2022年第1期

浏览历史

内容加载中请稍等...

多关键帧特征交互的人脸篡改视频检测被引量：9

参考文献4

二级参考文献36

共引文献28

同被引文献51

引证文献9

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

多关键帧特征交互的人脸篡改视频检测 被引量：9

参考文献4

二级参考文献36

共引文献28

同被引文献51

引证文献9

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

多关键帧特征交互的人脸篡改视频检测被引量：9