期刊文献+

局部相似度异常的强泛化性伪造人脸检测

Local similarity anomaly for general face forgery detection
原文传递
导出
摘要 目的人脸伪造技术迅猛发展,对社会信息安全构成了严重威胁,亟需强泛化性伪造人脸检测算法抵抗多种多样的伪造模型。目前的研究发现伪造算法普遍包含人脸与背景融合的操作,这意味着任何伪造方式都难以避免在人脸边缘遗留下伪造痕迹。根据这一发现,本文将模型的学习目标从特定的伪造痕迹特征转化为更加普适的人脸图像局部相似度特征,并提出了局部相似度异常的深度伪造人脸检测算法。方法首先提出了局部相似度预测(local similarity predicator,LSP)模块,通过一组局部相似度预测器分别计算RGB图像中间层特征图的局部异常,同时,为了捕捉频域中的真伪线索,还提出了可学习的空域富模型卷积金字塔(spatial rich model convolutional pyramid,SRMCP)来提取多尺度的高频噪声特征。结果在多个数据集上进行了大量实验。在泛化性方面,本文以ResNet18为骨干网络的模型在FF++4个子集上的跨库检测精度分别以0.77%、5.59%、6.11%和4.28%的优势超越了对比方法。在图像压缩鲁棒性方面,在3种不同压缩效果下,分别以2.48%、4.83%和10.10%的优势超越了对比方法。结论本文方法能够大幅度提升轻量型卷积神经网络的检测性能,相比于绝大部分工作都取得了更优异的泛化性和鲁棒性效果。 Objective In recent years,the development of DeepFake has made great progress,and the highly realistic forged face images created by such technology are posing a great threat not only to people’s privacy and security but also to the international political situation.Therefore,detection methods with good generalization ability need to be developed.In their early stages of development,forged faces had low fidelity with obvious defects.Therefore,traditional digital forensic algorithms and deep learning models could achieve good detection performances.However,with the development of Deep Fake,these forged faces become increasingly realistic,thus posing a challenge to detection algorithms.Researchers have focused on the essential differences between real and forged faces to improve the detection performances of their algo⁃rithms.The process of DeepFake can be decomposed into the following steps:1)detect and crop the face in the target image;2)forge the face using a forgery algorithm;3)paste the forged face back to the original image and use image fusion technology to eliminate the boundary defects and improve the visual effect.Step 3 often results in easily detectable local forgery traces,which are important cues for distinguishing real faces from fake ones.Many researchers have attempted to build models that can learn such traces to improve accuracy or to implement tampering localization.However,given that both the local traces and the image fusion methods involved in different forgery techniques widely differ,the detection algo⁃rithms for different forgery techniques have limited generalization ability.Therefore,although the local traces caused by Step 3 above are universal,directly learning such features for real and forged face recognition contributes little to generaliz⁃ability.Method This paper proposes a DeepFake detection method based on local similarity anomalies to achieve high gen⁃eralizability.Instead of directly learning local forgery traces to distinguish real faces from fake ones,this method transforms the learning objective into the similarity of local features.Specifically,the face region of the forged face image has source features that differ from the background region,and although these two types of regions have uniform source features inter⁃nally,the fusion boundary between the face and background contains conflicting source features and thus has low level of local similarity.These local similarity anomalies are independent of both the specific forgery algorithm and the fusion algo⁃rithm and can be regarded as heterogeneous features that are highly consistent with the essential difference between real and fake faces.To cache these traces,this paper proposes the local similarity predicator module.By decomposing the local depth features of face images into horizontal and vertical groups,the learning objective is converted from recognizing spe⁃cific forgery traces to predicting the similarity of source features within the image by calculating the similarity of local depth features and their neighbors so as to capture the essential differences between real and fake faces in a general way.In addi⁃tion,previous studies find that frequency domain features contain important clues for distinguishing real from fake faces.The proposed method draws on the domain knowledge of steganalysis and constructs a learnable convolutional pyramid mod⁃ule based on the spatial rich model(SRM),which compensates for the limited ability to express true and false features in the RGB space and improves the in-domain detection performance.This study also proposes the spatial rich model convolu⁃tional pyramid,which inherits the high-frequency noise features extracted by the spatial rich model convolutional pyramid(SRMCP)kernel,can be continuously updated during the training,and can be extended to a pyramid architecture with dif⁃ferent receptive fields to effectively capture high-frequency noise features at different scales.Result The overall results of FF++are compared under three compression factors.The proposed method,which uses ResNet18 as its backbone,achieves extremely high detection accuracy on both raw and compressed datasets.This method not only significantly outper⁃forms the classical digital forensic algorithms but also surpasses some of the recently proposed advanced algorithms for deep forgery detection.Specifically,the proposed method achieves 99.72%,98.34%,and 90.73%accuracies on RAW,C23,and C40,respectively,and its average accuracy is 2.31%and 13.33%(20.26%on the C40 dataset)higher than those of Xception and MesoNet,respectively.The proposed method also outperforms a metric learning method published in CVPR 2021 that incorporates the frequency and space domains.Specifically,the proposed method achieves 0.29%,1.63%,and 1.22%higher accuracies on RAW,C23,and C40,respectively,compared with this metric learning method.Overall,the proposed method takes the lead in terms of accuracy.Experimental results reveal that the local similarity module can effec⁃tively capture the inherent features of forged faces,thus substantially improving detection accuracy and achieving high accuracy even with a simple ResNet18 as the backbone.The average cross-domain area under curves(AUCs)of the pro⁃posed method reach 91.40%,96.03%,99.08%,and 96.05%on the four subsets of FF++,which are 15.41%,16.47%,21.11%,and 14.7%higher than those of Xception,respectively.In addition,the average accuracies of the pro⁃posed method are improved by 0.77%,5.59%,6.11%,and 4.28%,respectively,compared with state-of-the-art meth⁃ods.The cross-domain results on Celeb-DF show that the proposed method outperforms the existing methods with the help of ResNet18.Although recently introduced methods have made significant progress in cross-domain detection with an aver⁃age accuracy exceeding 70%,the cross-domain accuracies of the proposed method are 1.11%,3.73%,and 5.17%higher compared with those of state-of-the-art methods.Conclusion The method proposed in this paper can greatly improve the detection performance of lightweight convolutional neural networks and achieves better generalization and robustness com⁃pared with other recently proposed methods.The local similarity learning module will be further optimized in future work to ensure that it can predict local anomalies with different types of forged faces to further improve its generalizability on unknown forged faces.
作者 戴昀书 费建伟 夏志华 刘家男 翁健 Dai Yunshu;Fei Jianwei;Xia Zhihua;Liu Jianan;Weng Jian(School of Cyber Science and Technology,Sun Yat-sen University,Shenzhen 518107,China;School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China;School of Cyberspace Security,Jinan University,Guangzhou 510632,China)
出处 《中国图象图形学报》 CSCD 北大核心 2023年第11期3453-3470,共18页 Journal of Image and Graphics
基金 国家重点研发计划资助(2022YFB3103100,2020YFB1005600) 国家自然科学基金项目(62122032,62172233,62102189,U1936118,61931004) 江苏省研究生科研与创新项目(KYCX22_1207)。
关键词 深度伪造人脸检测 空域富模型(SRM) 卷积金字塔 局部学习相似度 多任务学习 deep face forgery detection spatially rich model(SRM) convolutional pyramid local similarity learning multi-task learning
  • 相关文献

参考文献8

二级参考文献16

共引文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部