摘要
【目的】针对现有多模态讽刺检测模型中存在预测准确率不高、多模态特征难以融合等问题,设计一种SC-Attention融合机制。【方法】采用CLIP和RoBERTa模型分别提取图像、图像属性和文本三种模态特征,经由SENet的注意力机制和Co-Attention机制结合构成的SC-Attention机制将多模态特征进行融合,以原始模态特征为引导,合理分配特征权重,最后输入全连接层进行讽刺检测。【结果】实验结果表明,基于SC-Attention机制的多模态讽刺检测的准确率为93.71%,F1值为91.68%,与基准模型相比,准确率提升10.27个百分点,F1值提升11.50个百分点。【局限】模型的泛化性需要在更多数据集上体现出来。【结论】SCAttention机制减少信息冗余和特征损失,有效提高多模态讽刺检测的准确率。
[Objective]This paper designs an SC-Attention fusion mechanism,aiming to improve the low prediction accuracy and difficult fusion of multimodal features in the existing detection models for multimodal sarcasm.[Methods]First,we used the CLIP and RoBERTa models to extract features from pictures,picture attributes,and texts.Then,we combined the SC-Attention mechanism with SENet’s attention mechanism to establish the Co-Attention mechanism and fuse multi-modal features.Third,we re-allocated attention feature weights by the original modals.Finally,we input features to the full connection layers to detect sarcasm.[Results]The accuracy and F1 of the proposed model reached 93.71%and 91.68%,which were 10.27 and 11.5 percentage point higher than the existing ones.[Limitations]We need to examine our model with more data sets.[Conclusions]The proposed model reduces information redundancy and feature loss,which effectively improves the accuracy of multimodal sarcasm detection.
作者
陈圆圆
马静
Chen Yuanyuan;Ma Jing(College of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第9期40-51,共12页
Data Analysis and Knowledge Discovery
基金
国家科学自然基金项目(项目编号:72174086)
中央高校基本科研业务费专项前瞻性发展策略研究基金项目(项目编号:NW2020001)的研究成果之一。