期刊文献+

预训练驱动的多模态边界感知视觉Transformer 被引量:1

Pre-training-driven Multimodal Boundary-aware Vision Transformer
下载PDF
导出
摘要 卷积神经网络(convolutional neural network,CNN)在图像篡改检测任务中不断取得性能突破,但在面向真实场景下篡改手段未知的情况时,现有方法仍然无法有效地捕获输入图像的长远依赖关系以缓解识别偏差问题,从而影响检测精度.此外,由于标注困难,图像篡改检测任务通常缺乏精准的像素级图像标注信息.针对以上问题,提出一种预训练驱动的多模态边界感知视觉Transformer.首先,为捕获在RGB域中不可见的细微伪造痕迹,引入图像的频域模态并将其与RGB空间域结合作为多模态嵌入形式.其次利用ImageNet对主干网络的编码器进行训练以缓解当前训练样本不足的问题.然后,Transformer模块被整合到该编码器的尾部,以达到同时捕获低级空间细节信息和全局上下文的目的,从而提升模型的整体表征能力.最后,为有效地缓解因伪造区域边界模糊导致的定位难问题,构建边界感知模块,其可以通过Scharr卷积层获得的噪声分布以更多地关注噪声信息而不是语义内容,并利用边界残差块锐化边界信息,从而提升模型的边界分割性能.大量实验结果表明,所提方法在识别精度上优于现有的图像篡改检测方法,并对不同的篡改手段具有较好的泛化性和鲁棒性. Convolutional neural networks(CNN)have continuously achieved performance breakthroughs in image forgery detection,but when faced with realistic scenarios where the means of tampering is unknown,the existing methods are still unable to effectively capture the long-term dependencies of the input image to alleviate the recognition bias problem,which affects the detection accuracy.In addition,due to the difficulty in labeling,image forgery detection usually lacks accurate pixel-level image labeling information.Considering the above problems,this study proposes a pre-training-driven multimodal boundary-aware vision transformer.To capture the subtle forgery traces invisible in the RGB domain,the method first introduces the frequency-domain modality of the image and combines it with the RGB spatial domain as a form of multimodal embedding.Secondly,the encoder of the backbone network is trained with ImageNet to alleviate the current problem of insufficient training samples.Then,the transformer module is integrated into the tail of this encoder to capture both low-level spatial details and global contexts,which improves the overall representation ability of the model.Finally,to effectively alleviate the problem of difficult localization caused by the blurred boundary of the forged regions,this study establishes a boundary-aware module,which can use the noise distribution obtained by the Scharr convolutional layer to pay more attention to the noise information rather than the semantic content and utilize the boundary residual block to sharpen the boundary information.In this way,the boundary segmentation performance of the model can be enhanced.The results of extensive experiments show that the proposed method outperforms existing image forgery detection methods in terms of recognition accuracy and has better generalization and robustness to different forgery methods.
作者 石泽男 陈海鹏 张冬 申铉京 SHI Ze-Nan;CHEN Hai-Peng;ZHANG Dong;SHEN Xuan-Jing(College of Computer Science and Technology,Jilin University,Changchun 130012,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education(Jilin University),Changchun 130012,China;Department of Computer Science and Engineering,The Hong Kong University of Science and Technology,Hong Kong 999077,China)
出处 《软件学报》 EI CSCD 北大核心 2023年第5期2051-2067,共17页 Journal of Software
基金 国家重点研发计划(2018YFB0804202,2018YFB0804203) 国家自然科学基金(U19A2057,61876070) 吉林大学2021年度“学科交叉融合创新”青年学者自由探索类项目(JLUXKJC2021QZ01)。
关键词 模型预训练 多模态 视觉Transformer 边界感知 图像篡改检测 model pre-training multimodal vision Transformer boundary awareness image forgery detection
  • 相关文献

参考文献3

二级参考文献9

共引文献40

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部