Recently,multimodal sentiment analysis has increasingly attracted attention with the popularity of complementary data streams,which has great potential to surpass unimodal sentiment analysis.One challenge of multimoda...Recently,multimodal sentiment analysis has increasingly attracted attention with the popularity of complementary data streams,which has great potential to surpass unimodal sentiment analysis.One challenge of multimodal sentiment analysis is how to design an efficient multimodal feature fusion strategy.Unfortunately,existing work always considers feature-level fusion or decision-level fusion,and few research works focus on hybrid fusion strategies that contain feature-level fusion and decision-level fusion.To improve the performance of multimodal sentiment analysis,we present a novel multimodal sentiment analysis model using BiGRU and attention-based hybrid fusion strategy(BAHFS).Firstly,we apply BiGRU to learn the unimodal features of text,audio and video.Then we fuse the unimodal features into bimodal features using the bimodal attention fusion module.Next,BAHFS feeds the unimodal features and bimodal features into the trimodal attention fusion module and the trimodal concatenation fusion module simultaneously to get two sets of trimodal features.Finally,BAHFS makes a classification with the two sets of trimodal features respectively and gets the final analysis results with decision-level fusion.Based on the CMU-MOSI and CMU-MOSEI datasets,extensive experiments have been carried out to verify BAHFS’s superiority.展开更多
基金funded by the National Natural Science Foundation of China (Grant No.61872126,No.62273290)supported by the Key project of Natural Science Foundation of Shandong Province (Grant No.ZR2020KF019).
文摘Recently,multimodal sentiment analysis has increasingly attracted attention with the popularity of complementary data streams,which has great potential to surpass unimodal sentiment analysis.One challenge of multimodal sentiment analysis is how to design an efficient multimodal feature fusion strategy.Unfortunately,existing work always considers feature-level fusion or decision-level fusion,and few research works focus on hybrid fusion strategies that contain feature-level fusion and decision-level fusion.To improve the performance of multimodal sentiment analysis,we present a novel multimodal sentiment analysis model using BiGRU and attention-based hybrid fusion strategy(BAHFS).Firstly,we apply BiGRU to learn the unimodal features of text,audio and video.Then we fuse the unimodal features into bimodal features using the bimodal attention fusion module.Next,BAHFS feeds the unimodal features and bimodal features into the trimodal attention fusion module and the trimodal concatenation fusion module simultaneously to get two sets of trimodal features.Finally,BAHFS makes a classification with the two sets of trimodal features respectively and gets the final analysis results with decision-level fusion.Based on the CMU-MOSI and CMU-MOSEI datasets,extensive experiments have been carried out to verify BAHFS’s superiority.