摘要
为了减少虚假新闻给社会带来的负面影响,虚假新闻检测一直是自然语言处理中的一个重要领域。现有多模态虚假新闻检测方法通常使用预训练模型充当特征提取器,但是这些方法存在以下不足:(1)预训练模型参数在模型训练过程中总是会冻结,但预训练模型并不完美;(2)基于CNN(convolutional neural network)的图像特征提取器结构通常比基于Transformer的文本特征提取器结构更加复杂,图像特征通常被提前存储,使得这些模型的缺点被忽略。为此,本文提出基于端到端训练的多模态Transformer模型,通过使用视觉Transformer代替CNN提取图像特征,统一了不同模态的特征提取过程,利用共同注意力模块实现图像特征和文本特征交叉融合,并且在3个公开数据集上进行了对比实验。实验结果表明,本文模型性能超越了其他基线模型。
Fake news detection has been an essential area in natural language processing to reduce the negative impact of misinformation on society.Most existing multimodal fake news detection methods use pre-trained models to act as feature extractors;however,these methods have the following shortcomings:(1)Pre-trained model parameters are typically frozen during model training.However,it is crucial to note that these pre-trained models are not flawless;(2)CNN-based image feature extractor structures are typically more complex than Transformer-based text feature extractor structures,and because image features are typically stored in advance,the shortcomings of these models are negligible.Therefore,this study proposes a multimodal end-to-end Transformer,unifies the feature extraction process for different modalities by extracting image features using a vision Transformer rather than a CNN,achieves cross-fusion of image features and text features using a co-attention module,and conducts comparative experiments on three public datasets.The experimental results show that the performance of the model proposed in this study outperforms other baseline models.
作者
王震宇
朱学芳
Wang Zhenyu;Zhu Xuefang(School of Information Management,Nanjing University,Nanjing 210023)
出处
《情报学报》
CSSCI
CSCD
北大核心
2023年第12期1477-1486,共10页
Journal of the China Society for Scientific and Technical Information
基金
国家社会科学基金项目“5G环境下中国智慧知识服务体系构建研究”(22BTQ017)。