期刊文献+
共找到91篇文章
< 1 2 5 >
每页显示 20 50 100
A Robust Framework for Multimodal Sentiment Analysis with Noisy Labels Generated from Distributed Data Annotation
1
作者 Kai Jiang Bin Cao Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2965-2984,共20页
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha... Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines. 展开更多
关键词 Distributed data collection multimodal sentiment analysis meta learning learn with noisy labels
下载PDF
Multimodal sentiment analysis for social media contents during public emergencies
2
作者 Tao Fan Hao Wang +2 位作者 Peng Wu Chen Ling Milad Taleby Ahvanooey 《Journal of Data and Information Science》 CSCD 2023年第3期61-87,共27页
Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory a... Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory accuracy of sentiment analysis,lacking the combination of multimodal contents.In this paper,we propose to combine texts and images generated in the social media to perform sentiment analysis.Design/methodology/approach:We propose a Deep Multimodal Fusion Model(DMFM),which combines textual and visual sentiment analysis.We first train word2vec model on a large-scale public emergency corpus to obtain semantic-rich word vectors as the input of textual sentiment analysis.BiLSTM is employed to generate encoded textual embeddings.To fully excavate visual information from images,a modified pretrained VGG16-based sentiment analysis network is used with the best-performed fine-tuning strategy.A multimodal fusion method is implemented to fuse textual and visual embeddings completely,producing predicted labels.Findings:We performed extensive experiments on Weibo and Twitter public emergency datasets,to evaluate the performance of our proposed model.Experimental results demonstrate that the DMFM provides higher accuracy compared with baseline models.The introduction of images can boost the performance of sentiment analysis during public emergencies.Research limitations:In the future,we will test our model in a wider dataset.We will also consider a better way to learn the multimodal fusion information.Practical implications:We build an efficient multimodal sentiment analysis model for the social media contents during public emergencies.Originality/value:We consider the images posted by online users during public emergencies on social platforms.The proposed method can present a novel scope for sentiment analysis during public emergencies and provide the decision support for the government when formulating policies in public emergencies. 展开更多
关键词 Public emergency multimodal sentiment analysis Social platform Textual sentiment analysis Visual sentiment analysis
下载PDF
Leveraging Vision-Language Pre-Trained Model and Contrastive Learning for Enhanced Multimodal Sentiment Analysis
3
作者 Jieyu An Wan Mohd Nazmee Wan Zainon Binfen Ding 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1673-1689,共17页
Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on... Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment. 展开更多
关键词 multimodal sentiment analysis vision–language pre-trained model contrastive learning sentiment classification
下载PDF
Improving Targeted Multimodal Sentiment Classification with Semantic Description of Images
4
作者 Jieyu An Wan Mohd Nazmee Wan Zainon Zhang Hao 《Computers, Materials & Continua》 SCIE EI 2023年第6期5801-5815,共15页
Targeted multimodal sentiment classification(TMSC)aims to identify the sentiment polarity of a target mentioned in a multimodal post.The majority of current studies on this task focus on mapping the image and the text... Targeted multimodal sentiment classification(TMSC)aims to identify the sentiment polarity of a target mentioned in a multimodal post.The majority of current studies on this task focus on mapping the image and the text to a high-dimensional space in order to obtain and fuse implicit representations,ignoring the rich semantic information contained in the images and not taking into account the contribution of the visual modality in the multimodal fusion representation,which can potentially influence the results of TMSC tasks.This paper proposes a general model for Improving Targeted Multimodal Sentiment Classification with Semantic Description of Images(ITMSC)as a way to tackle these issues and improve the accu-racy of multimodal sentiment analysis.Specifically,the ITMSC model can automatically adjust the contribution of images in the fusion representation through the exploitation of semantic descriptions of images and text similarity relations.Further,we propose a target-based attention module to capture the target-text relevance,an image-based attention module to capture the image-text relevance,and a target-image matching module based on the former two modules to properly align the target with the image so that fine-grained semantic information can be extracted.Our experimental results demonstrate that our model achieves comparable performance with several state-of-the-art approaches on two multimodal sentiment datasets.Our findings indicate that incorporating semantic descriptions of images can enhance our understanding of multimodal content and lead to improved sentiment analysis performance. 展开更多
关键词 Targeted sentiment analysis multimodal sentiment classification visual sentiment textual sentiment social media
下载PDF
End-to-end aspect category sentiment analysis based on type graph convolutional networks
5
作者 邵清 ZHANG Wenshuang WANG Shaojun 《High Technology Letters》 EI CAS 2023年第3期325-334,共10页
For the existing aspect category sentiment analysis research,most of the aspects are given for sentiment extraction,and this pipeline method is prone to error accumulation,and the use of graph convolutional neural net... For the existing aspect category sentiment analysis research,most of the aspects are given for sentiment extraction,and this pipeline method is prone to error accumulation,and the use of graph convolutional neural network for aspect category sentiment analysis does not fully utilize the dependency type information between words,so it cannot enhance feature extraction.This paper proposes an end-to-end aspect category sentiment analysis(ETESA)model based on type graph convolutional networks.The model uses the bidirectional encoder representation from transformers(BERT)pretraining model to obtain aspect categories and word vectors containing contextual dynamic semantic information,which can solve the problem of polysemy;when using graph convolutional network(GCN)for feature extraction,the fusion operation of word vectors and initialization tensor of dependency types can obtain the importance values of different dependency types and enhance the text feature representation;by transforming aspect category and sentiment pair extraction into multiple single-label classification problems,aspect category and sentiment can be extracted simultaneously in an end-to-end way and solve the problem of error accumulation.Experiments are tested on three public datasets,and the results show that the ETESA model can achieve higher Precision,Recall and F1 value,proving the effectiveness of the model. 展开更多
关键词 aspect-based sentiment analysis(ABSA) bidirectional encoder representation from transformers(BERT) type graph convolutional network(TGCN) aspect category and senti-ment pair extraction
下载PDF
Fusing Syntactic Structure Information and Lexical Semantic Information for End-to-End Aspect-Based Sentiment Analysis 被引量:2
6
作者 Yong Bie Yan Yang Yiling Zhang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第2期230-243,共14页
The aspect-based sentiment analysis(ABSA)consists of two subtasksaspect term extraction and aspect sentiment prediction.Most methods conduct the ABSA task by handling the subtasks in a pipeline manner,whereby problems... The aspect-based sentiment analysis(ABSA)consists of two subtasksaspect term extraction and aspect sentiment prediction.Most methods conduct the ABSA task by handling the subtasks in a pipeline manner,whereby problems in performance and real application emerge.In this study,we propose an end-to-end ABSA model,namely,SSi-LSi,which fuses the syntactic structure information and the lexical semantic information,to address the limitation that existing end-to-end methods do not fully exploit the text information.Through two network branches,the model extracts syntactic structure information and lexical semantic information,which integrates the part of speech,sememes,and context,respectively.Then,on the basis of an attention mechanism,the model further realizes the fusion of the syntactic structure information and the lexical semantic information to obtain higher quality ABSA results,in which way the text information is fully used.Subsequent experiments demonstrate that the SSi-LSi model has certain advantages in using different text information. 展开更多
关键词 deep learning natural language processing aspect-based sentiment analysis graph convolutional
原文传递
Modal Interactive Feature Encoder for Multimodal Sentiment Analysis
7
作者 Xiaowei Zhao Jie Zhou Xiujuan Xu 《国际计算机前沿大会会议论文集》 EI 2023年第2期285-303,共19页
Multimodal Sentiment analysis refers to analyzing emotions in infor-mation carriers containing multiple modalities.To better analyze the features within and between modalities and solve the problem of incomplete multi... Multimodal Sentiment analysis refers to analyzing emotions in infor-mation carriers containing multiple modalities.To better analyze the features within and between modalities and solve the problem of incomplete multimodal feature fusion,this paper proposes a multimodal sentiment analysis model MIF(Modal Interactive Feature Encoder For Multimodal Sentiment Analysis).First,the global features of three modalities are obtained through unimodal feature extraction networks.Second,the inter-modal interactive feature encoder and the intra-modal interactive feature encoder extract similarity features between modal-ities and intra-modal special features separately.Finally,unimodal special features and the interaction information between modalities are decoded to get the fusion features and predict sentimental polarity results.We conduct extensive experi-ments on three public multimodal datasets,including one in Chinese and two in English.The results show that the performance of our approach is significantly improved compared with benchmark models. 展开更多
关键词 multimodal sentiment analysis Modal Interaction Feature ENCODER
原文传递
Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification
8
作者 Israa K.Salman Al-Tameemi Mohammad-Reza Feizi-Derakhshi +1 位作者 Saeed Pashazadeh Mohammad Asadpour 《Computers, Materials & Continua》 SCIE EI 2023年第8期2145-2177,共33页
Multimodal Sentiment Analysis(SA)is gaining popularity due to its broad application potential.The existing studies have focused on the SA of single modalities,such as texts or photos,posing challenges in effectively h... Multimodal Sentiment Analysis(SA)is gaining popularity due to its broad application potential.The existing studies have focused on the SA of single modalities,such as texts or photos,posing challenges in effectively handling social media data with multiple modalities.Moreover,most multimodal research has concentrated on merely combining the two modalities rather than exploring their complex correlations,leading to unsatisfactory sentiment classification results.Motivated by this,we propose a new visualtextual sentiment classification model named Multi-Model Fusion(MMF),which uses a mixed fusion framework for SA to effectively capture the essential information and the intrinsic relationship between the visual and textual content.The proposed model comprises three deep neural networks.Two different neural networks are proposed to extract the most emotionally relevant aspects of image and text data.Thus,more discriminative features are gathered for accurate sentiment classification.Then,a multichannel joint fusion modelwith a self-attention technique is proposed to exploit the intrinsic correlation between visual and textual characteristics and obtain emotionally rich information for joint sentiment classification.Finally,the results of the three classifiers are integrated using a decision fusion scheme to improve the robustness and generalizability of the proposed model.An interpretable visual-textual sentiment classification model is further developed using the Local Interpretable Model-agnostic Explanation model(LIME)to ensure the model’s explainability and resilience.The proposed MMF model has been tested on four real-world sentiment datasets,achieving(99.78%)accuracy on Binary_Getty(BG),(99.12%)on Binary_iStock(BIS),(95.70%)on Twitter,and(79.06%)on the Multi-View Sentiment Analysis(MVSA)dataset.These results demonstrate the superior performance of our MMF model compared to single-model approaches and current state-of-the-art techniques based on model evaluation criteria. 展开更多
关键词 sentiment analysis multimodal classification deep learning joint fusion decision fusion INTERPRETABILITY
下载PDF
多模态方面级情感分析的多视图交互学习网络 被引量:1
9
作者 王旭阳 庞文倩 赵丽婕 《计算机工程与应用》 CSCD 北大核心 2024年第7期92-100,共9页
以往的多模态方面级情感分析方法只利用预训练模型的一般文本和图片表示,对方面和观点词相关性的识别不敏感,且不能动态获取图片信息对单词表示的贡献,因而不能充分识别多模态与方面之间的相关性。针对上述问题,提出一种多视图交互学习... 以往的多模态方面级情感分析方法只利用预训练模型的一般文本和图片表示,对方面和观点词相关性的识别不敏感,且不能动态获取图片信息对单词表示的贡献,因而不能充分识别多模态与方面之间的相关性。针对上述问题,提出一种多视图交互学习网络模型。将句子从上下文和句法两个视图上分别提取特征,以便在多模态交互时充分利用到文本的全局特征;对文本、图片和方面之间的关系进行建模,使模型实现多模态交互;同时融合不同模态的交互表示,动态获取视觉信息对文本中每个单词的贡献程度,充分提取模态与方面之间的相关性。最后通过全连接层和Softmax层获取情感分类结果。在两个数据集上进行实验,实验结果表明该模型能够有效增强多模态方面级情感分类的效果。 展开更多
关键词 多模态方面级情感分析 预训练模型 多视图学习 多模态交互 动态融合
下载PDF
基于跨模态交叉注意力网络的多模态情感分析方法 被引量:1
10
作者 王旭阳 王常瑞 +1 位作者 张金峰 邢梦怡 《广西师范大学学报(自然科学版)》 CAS 北大核心 2024年第2期84-93,共10页
挖掘不同模态内信息和模态间信息有助于提升多模态情感分析的性能,本文为此提出一种基于跨模态交叉注意力网络的多模态情感分析方法。首先,利用VGG-16网络将多模态数据映射到全局特征空间;同时,利用Swin Transformer网络将多模态数据映... 挖掘不同模态内信息和模态间信息有助于提升多模态情感分析的性能,本文为此提出一种基于跨模态交叉注意力网络的多模态情感分析方法。首先,利用VGG-16网络将多模态数据映射到全局特征空间;同时,利用Swin Transformer网络将多模态数据映射到局部特征空间;其次,构造模态内自注意力和模态间交叉注意力特征;然后,设计一种跨模态交叉注意力融合模块实现不同模态内和模态间特征的深度融合,提升多模态特征表达的可靠性;最后,通过Softmax获得最终预测结果。在2个开源数据集CMU-MOSI和CMU-MSOEI上进行测试,本文模型在七分类任务上获得45.9%和54.1%的准确率,相比当前MCGMF模型,提升了0.66%和2.46%,综合性能提升显著。 展开更多
关键词 情感分析 多模态 跨模态交叉注意力 自注意力 局部和全局特征
下载PDF
面向视频数据的多模态情感分析
11
作者 武星 殷浩宇 +2 位作者 姚骏峰 李卫民 钱权 《计算机工程》 CAS CSCD 北大核心 2024年第6期218-227,共10页
多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域已取得一定成果,但是已有方法在处理模态间分布差异和关系知识的融合方面仍有欠缺,为此,提出一种... 多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域已取得一定成果,但是已有方法在处理模态间分布差异和关系知识的融合方面仍有欠缺,为此,提出一种多模态情感分析方法。设计一种多模态提示门(MPG)模块,其能够将非语言信息转换为融合文本上下文的提示,利用文本信息对非语言信号的噪声进行过滤,得到包含丰富语义信息的提示,以增强模态间的信息整合。此外,提出一种实例到标签的对比学习框架,在语义层面上区分隐空间中的不同标签以进一步优化模型输出。在3个大规模情感分析数据集上的实验结果表明,该方法的二分类精度相对次优模型提高了约0.7%,三分类精度提高了超过2.5%,达到0.671。该方法能够为将多模态情感分析引入用户画像、视频理解、AI面试等领域提供参考。 展开更多
关键词 多模态情感分析 语义信息 多模态融合 上下文表征 对比学习
下载PDF
基于CLIP和交叉注意力的多模态情感分析模型
12
作者 陈燕 赖宇斌 +2 位作者 肖澳 廖宇翔 陈宁江 《郑州大学学报(工学版)》 CAS 北大核心 2024年第2期42-50,共9页
针对多模态情感分析中存在的标注数据量少、模态间融合不充分以及信息冗余等问题,提出了一种基于对比语言-图片训练(CLIP)和交叉注意力(CA)的多模态情感分析(MSA)模型CLIP-CA-MSA。首先,该模型使用CLIP预训练的BERT模型、PIFT模型来提... 针对多模态情感分析中存在的标注数据量少、模态间融合不充分以及信息冗余等问题,提出了一种基于对比语言-图片训练(CLIP)和交叉注意力(CA)的多模态情感分析(MSA)模型CLIP-CA-MSA。首先,该模型使用CLIP预训练的BERT模型、PIFT模型来提取视频特征向量与文本特征;其次,使用交叉注意力机制将图像特征向量和文本特征向量进行交互,以加强不同模态之间的信息传递;最后,利用不确定性损失特征融合后计算输出最终的情感分类结果。实验结果表明:该模型比其他多模态模型准确率提高5百分点至14百分点,F1值提高3百分点至12百分点,验证了该模型的优越性,并使用消融实验验证该模型各模块的有效性。该模型能够有效地利用多模态数据的互补性和相关性,同时利用不确定性损失来提高模型的鲁棒性和泛化能力。 展开更多
关键词 情感分析 多模态学习 交叉注意力 CLIP模型 TRANSFORMER 特征融合
下载PDF
A Multitask Multiview Neural Network for End-to-End Aspect-Based Sentiment Analysis 被引量:5
13
作者 Yong Bie Yan Yang 《Big Data Mining and Analytics》 EI 2021年第3期195-207,共13页
The aspect-based sentiment analysis(ABSA) consists of two subtasks—aspect term extraction and aspect sentiment prediction. Existing methods deal with both subtasks one by one in a pipeline manner, in which there lies... The aspect-based sentiment analysis(ABSA) consists of two subtasks—aspect term extraction and aspect sentiment prediction. Existing methods deal with both subtasks one by one in a pipeline manner, in which there lies some problems in performance and real application. This study investigates the end-to-end ABSA and proposes a novel multitask multiview network(MTMVN) architecture. Specifically, the architecture takes the unified ABSA as the main task with the two subtasks as auxiliary tasks. Meanwhile, the representation obtained from the branch network of the main task is regarded as the global view, whereas the representations of the two subtasks are considered two local views with different emphases. Through multitask learning, the main task can be facilitated by additional accurate aspect boundary information and sentiment polarity information. By enhancing the correlations between the views under the idea of multiview learning, the representation of the global view can be optimized to improve the overall performance of the model. The experimental results on three benchmark datasets show that the proposed method exceeds the existing pipeline methods and end-to-end methods, proving the superiority of our MTMVN architecture. 展开更多
关键词 deep learning multitask learning multiview learning natural language processing aspect-based sentiment analysis
原文传递
利用信息瓶颈的多模态情感分析 被引量:2
14
作者 程子晨 李彦 +2 位作者 葛江炜 纠梦菲 张敬伟 《计算机工程与应用》 CSCD 北大核心 2024年第2期137-146,共10页
在多模态情感分析领域,之前的研究主要集中在如何针对不同模态的信息进行交互融合。然而基于各种复杂的融合策略会使得生成的多模态表示向量不可避免地携带大量与下游任务无关的噪声信息,这会导致较高的过拟合风险,并且影响高质量预测... 在多模态情感分析领域,之前的研究主要集中在如何针对不同模态的信息进行交互融合。然而基于各种复杂的融合策略会使得生成的多模态表示向量不可避免地携带大量与下游任务无关的噪声信息,这会导致较高的过拟合风险,并且影响高质量预测结果的生成。为解决上述问题,根据信息瓶颈理论,设计了包含两个互信息估计器的互信息估计模块,旨在优化多模态表示向量与真实标签之间的互信息下界,同时最小化多模态表示向量与输入数据之间的互信息,以达到寻找一种简洁的、具有较好预测能力的多模态表示向量。利用MOSI、MOSEI和CH-SIMS数据集进行对比实验,结果表明提出的方法是富有成效的。 展开更多
关键词 多模态情感分析 信息瓶颈理论 互信息估计
下载PDF
社交网络舆情多模态知识图谱构建框架研究 被引量:4
15
作者 何巍 《情报杂志》 北大核心 2024年第1期160-166,共7页
[研究目的]信息技术的发展丰富了社交媒体用户的沟通交流方式,研究社交网络舆情多模态知识图谱的构建对网络舆情治理具有重要的现实意义。[研究方法]基于多模态数据的语义互补,讨论了实体属性关联、图像(视频)文字描述、图像(视频)属性... [研究目的]信息技术的发展丰富了社交媒体用户的沟通交流方式,研究社交网络舆情多模态知识图谱的构建对网络舆情治理具有重要的现实意义。[研究方法]基于多模态数据的语义互补,讨论了实体属性关联、图像(视频)文字描述、图像(视频)属性、图像(视频)关联等多种异构数据融合方式。在此基础上,提出社交网络舆情多模态知识图谱的构建框架,并分析了在多模态语义理解、多模态实体对齐、多模态知识表示等方面存在的问题与挑战。[研究结论]提出基于多模态知识融合的社交网络舆情多模态知识图谱构建框架,为交互方式日趋丰富的社交网络舆情治理提供有益参考。 展开更多
关键词 社交媒体 多模态 多模态知识图谱 多模态数据 网络舆情 舆情治理 情感分析
下载PDF
自编码器动态主导融合的多模态情感分析
16
作者 杨溪 郭军军 +3 位作者 严海宁 谭凯文 相艳 余正涛 《计算机工程与应用》 CSCD 北大核心 2024年第6期180-187,共8页
多模态情感分析过程中,对情感判定起主导作用的模态常常是动态变化的。传统多模态情感分析方法中通常仅以文本为主导模态,而忽略了由于模态之间的差异性造成不同时刻主导模态的变化。针对如何在各个时刻动态选取主导模态的问题,提出一... 多模态情感分析过程中,对情感判定起主导作用的模态常常是动态变化的。传统多模态情感分析方法中通常仅以文本为主导模态,而忽略了由于模态之间的差异性造成不同时刻主导模态的变化。针对如何在各个时刻动态选取主导模态的问题,提出一种自编码器动态主导融合的多模态情感分析方法。该方法首先对单模态编码并获得多模态融合特征,再利用自编码器将其表征到共享空间内;在此空间内衡量单模态特征与融合模态特征的相关程度,在各个时刻动态地选取相关程度最大的模态作为该时刻的主导模态;最后,利用主导模态引导多模态信息融合,得到多模态鲁棒性表征。在多模态情感分析基准数据集CMU-MOSI上进行广泛实验,实验结果表明提出方法的有效性,并且优于大多数现有最先进的多模态情感分析方法。 展开更多
关键词 多模态情感分析 动态互补 主导模态 自编码器
下载PDF
融合自监督和多层交叉注意力的多模态情感分析网络
17
作者 薛凯鹏 徐涛 廖春节 《计算机应用》 CSCD 北大核心 2024年第8期2387-2392,共6页
针对多模态情感分析任务中模态内信息不完整、模态间交互能力差和难以训练的问题,将视觉语言预训练(VLP)模型应用于多模态情感分析领域,提出一种融合自监督和多层交叉注意力的多模态情感分析网络(MSSM)。通过自监督学习强化视觉编码器模... 针对多模态情感分析任务中模态内信息不完整、模态间交互能力差和难以训练的问题,将视觉语言预训练(VLP)模型应用于多模态情感分析领域,提出一种融合自监督和多层交叉注意力的多模态情感分析网络(MSSM)。通过自监督学习强化视觉编码器模块,并加入多层交叉注意力以更好地建模文本和视觉特征,使模态内部信息更丰富完整,同时使模态间的信息交互更充分。此外,通过具有感知意识的快速、内存效率高的精确注意力FlashAttention解决Transformer中注意力计算高复杂度的问题。实验结果表明,与目前主流的基于对比文本-图像对的模型(CLIP)相比,MSSM在处理后的MVSA-S数据集上的准确率提高3.6个百分点,在MVSA-M数据集上的准确率提高2.2个百分点,验证所提网络能在降低运算成本的同时有效提高多模态信息融合的完整性。 展开更多
关键词 多模态 情感分析 自监督 注意力机制 视觉语言预训练模型
下载PDF
基于非文本模态强化和门控融合方法的多模态情感分析
18
作者 魏金龙 邵新慧 《计算机应用研究》 CSCD 北大核心 2024年第1期39-44,共6页
针对各模态之间信息密度存在差距和融合过程中可能会丢失部分情感信息等问题,提出一种基于非文本模态强化和门控融合方法的多模态情感分析模型。该模型通过设计一个音频-视觉强化模块来实现音频和视觉模态的信息增强,从而减小与文本模... 针对各模态之间信息密度存在差距和融合过程中可能会丢失部分情感信息等问题,提出一种基于非文本模态强化和门控融合方法的多模态情感分析模型。该模型通过设计一个音频-视觉强化模块来实现音频和视觉模态的信息增强,从而减小与文本模态的信息差距。之后,通过跨模态注意力和门控融合方法,使得模型充分学习到多模态情感信息和原始情感信息,从而增强模型的表达能力。在对齐和非对齐的CMU-MOSEI数据集上的实验结果表明,所提模型是有效的,相比现有的一些模型取得了更好的性能。 展开更多
关键词 多模态情感分析 多模态融合 模态强化 门控机制
下载PDF
基于跨模态联合编码的多模态情感分析
19
作者 孙斌 江涛 +1 位作者 贾莉 崔伊明 《计算机工程与应用》 CSCD 北大核心 2024年第18期208-216,共9页
如何提高多模态融合特征的有效性是多模态情感分析领域的热点问题之一。以往的研究大多通过设计复杂的融合策略获取融合特征表示,这些方法往往忽略了模态间复杂的关联关系,同时存在着由于模态信息不一致所导致的融合特征有效性降低问题... 如何提高多模态融合特征的有效性是多模态情感分析领域的热点问题之一。以往的研究大多通过设计复杂的融合策略获取融合特征表示,这些方法往往忽略了模态间复杂的关联关系,同时存在着由于模态信息不一致所导致的融合特征有效性降低问题,进而影响模型的性能。针对上述问题,提出一种基于跨模态联合编码的多模态情感分析模型。在特征提取方面,利用预训练模型BERT和Facet模型分别提取文本和视觉特征,经过一维卷积操作获取相同维度的单模态特征表示。特征融合方面,利用跨模态注意力模块获得两模态的联合特征,使用联合特征分别调整单模态特征的权重,将两者拼接后获得多模态融合特征,最终输入到全连接层中进行情感识别。在公开数据集CMU-MOSI上的广泛实验表明,该模型的情感分析结果优于大多数现有先进的多模态情感分析方法,能够有效提升情感分析的性能。 展开更多
关键词 多模态情感分析 联合编码 跨模态注意力 多模态融合
下载PDF
用于未对齐多模态语言序列情感分析的多交互感知网络 被引量:1
20
作者 罗俊豪 朱焱 《计算机应用》 CSCD 北大核心 2024年第1期79-85,共7页
针对现有对齐多模态语言序列情感分析方法常用的单词对齐方法缺乏可解释性的问题,提出了一种用于未对齐多模态语言序列情感分析的多交互感知网络(MultiDAN)。MultiDAN的核心是多层的、多角度的交互信息提取。首先使用循环神经网络(RNN)... 针对现有对齐多模态语言序列情感分析方法常用的单词对齐方法缺乏可解释性的问题,提出了一种用于未对齐多模态语言序列情感分析的多交互感知网络(MultiDAN)。MultiDAN的核心是多层的、多角度的交互信息提取。首先使用循环神经网络(RNN)和注意力机制捕捉模态内的交互信息;然后,使用图注意力网络(GAT)一次性提取模态内及模态间的、长短期的交互信息;最后,使用特殊的图读出方法,再次提取图中节点的模态内及模态间交互信息,得到多模态语言序列的唯一表征,并应用多层感知机(MLP)分类获得序列的情感分数。在两个常用公开数据集CMU-MOSI和CMU-MOSEI上的实验结果表明,MultiDAN能充分提取交互信息,在未对齐的两个数据集上MultiDAN的F1值比对比方法中最优的模态时空注意图(MTAG)分别提高了0.49个和0.72个百分点,具有较高的稳定性。MultiDAN可以提高多模态语言序列的情感分析性能,且图神经网络(GNN)能有效提取模态内、模态间的交互信息。 展开更多
关键词 情感分析 多模态语言序列 多模态融合 图神经网络 注意力机制
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部