摘要
随着深度学习技术的发展,多模态情感分析已成为研究热点之一。然而,大多数多模态情感分析模型或从不同模态中提取特征向量并简单地进行加权求和,导致数据无法准确地映射到统一的多模态向量空间中,或依赖图像描述模型将图像转化为文本,导致提取到过多不包含情感信息的视觉语义,造成信息冗余,最终影响模型的性能。为了解决这些问题,提出了一种基于视觉语义与提示学习的多模态情感分析模型VSPL。该模型将图像转化为精确简短、蕴含情感信息的视觉语义词汇,从而缓解信息冗余的问题;并基于提示学习的方法,将得到的视觉语义词汇与针对情感分类任务而提前设计好的提示模板组合成新文本,实现模态融合,这样做既避免了由加权求和导致的特征空间映射不准确的问题,又能借助提示学习的方法激发预训练语言模型的潜在性能。对多模态情感分析任务进行了对比实验,结果表明所提模型VSPL在3个公开数据集上的性能超越了先进的基准模型。此外,还进行了消融实验、特征可视化和样例分析,验证了VSPL的有效性。
With the development of deep learning technology,multimodal sentiment analysis has become one of the research highlights.However,most multimodal sentiment analysis models either extract eigenvector from different modalities and simply use weighted sum method,resulting in data that cannot be accurately mapped into a unified multimodal vector space,or rely on image description models to translate image into text,resulting in the extraction of too many visual semantics without sentimental information and information redundancy,and ultimately affecting the performance of the model.To address these issues,a multimodal sentiment analysis model VSPL based on visual semantics and prompt learning is proposed.This model translates images into precise,concise,and sentimentally informative visual semantic vocabulary to alleviate the problem of information redundancy.Based on prompt learning,the obtained visual semantic vocabulary is combined with pre-designed prompt templates for sentiment classification tasks to form new text,achieving modal fusion.It not only avoids the problem of inaccurate feature space mapping caused by weighted sum method,but also stimulates the potential performance of pre-trained language model through prompt learning methods.Comparative experiments are conducted on multimodal sentiment analysis tasks,and the proposed model VSPL outperforms advanced baseline models on three public datasets.In addition,ablation experiments,feature visualization,and sample analysis are conducted to verify the effectiveness of VSPL.
作者
莫书渊
蒙祖强
MO Shuyuan;MENG Zuqiang(School of Computer and Electronic Information,Guangxi University,Nanning 530004,China)
出处
《计算机科学》
CSCD
北大核心
2024年第9期250-257,共8页
Computer Science
基金
国家自然科学基金(62266004)。
关键词
多模态
视觉语义
提示学习
情感分析
预训练语言模型
Multimodal
Visual semantics
Prompt learning
Sentiment analysis
Pre-trained language model