期刊文献+

多模态情感识别与理解发展现状及趋势

Development of multimodal sentiment recognition and understanding
原文传递
导出
摘要 情感计算是人工智能领域的一个重要分支,在交互、教育、安全和金融等众多领域应用广泛。单纯依靠语音、视频单一模态的情感识别并不符合人类对情感的感知模式,在受到干扰的情况下识别准确率会迅速下降。为了充分挖掘不同模态数据的互补性,多模态融合的情感识别研究正日益受到研究人员的广泛重视。本文分别从多模态情感识别概述、多模态情感识别与理解、抑郁症情感障碍检测及干预3个维度介绍多模态情感计算研究现状。本文认为具备可扩展性的情感特征设计、基于大模型迁移学习的识别方法将是未来的发展方向,并在解决抑郁、焦虑等情感障碍方面的作用日益凸显。 Affective computing is an important branch in the field of artificial intelligence(AI).It aims to build a computa⁃tional system that can automatically perceive,recognize,understand,and provide feedback on human emotions.Itinvolves the intersection of multiple disciplines such as computer science,neuroscience,psychology,and social science.Deep emotional understanding and interaction can enable computers to better understand and respond to human emotionalneeds.It can also provide personalized interactions and feedback based on emotional states,which enhances the humancomputer interaction experience.It has various applications in areas such as intelligent assistants,virtual reality,andsmart healthcare.Relying solely on single-modal information,such as speech signal or video,does not align with the wayhumans perceive emotions.The accuracy of recognition rapidly decreases when faced with interference.Multimodal emo⁃tion understanding and interaction technologies aim to fully model multidimensional information from audio,video,andphysiological signals to achieve more accurate emotion understanding.This technology is fundamental and an importantprerequisite for achieving natural,human-like,and personalized human-computer interaction.It holds significant value forushering in the era of intelligence and digitalization.Multimodal fusion for sentiment recognition receives increasing atten⁃tion from researchers in fully exploiting the complementary nature of different modalities.This study introduces the currentresearch status of multimodal sentiment computation from three dimensions:an overview of multimodal sentiment recogni⁃tion,multimodal sentiment understanding,and detection and assessment of emotional disorders such as depression.Theoverview of emotion recognition is elaborated from the aspects of academic definition,mainstream datasets,and interna⁃tional competitions.In recent years,large language models(LLMs)have demonstrated excellent modeling capabilities andachieved great success in the field of natural language processing with their outstanding language understanding and reason⁃ing abilities.LLMs have garnered widespread attention because of their ability to handle various complex tasks by under⁃standing prompts with minimal or zero-shot learning.Through methods such as self-supervised learning or contrastive learn⁃ing,LLMs can learn more expressive multimodal representations,which can capture the correlations between differentmodalities and emotional information.Multimodal sentiment recognition and understanding are discussed in terms of emo⁃tion feature extraction,multimodal fusion,and the representation and models involved in sentiment recognition under thebackground of pre-trained large models.With the rapid development of society,people are facing increasing pressure,which can lead to feelings of depression,anxiety,and other negative emotions.Those who are in a prolonged state ofdepression and anxiety are more likely to develop mental illnesses.Depression is a common and serious condition,withsymptoms including low mood,poor sleep quality,loss of appetite,fatigue,and difficulty concentrating.Depression notonly harms individuals and families but also causes significant economic losses to society.The detection of emotional disor⁃ders starts from specific applications,which selects depression as the most common emotional disorder.We analyze its lat⁃est developments and trends from the perspectives of assessment and intervention.In addition,this study provides adetailed comparison of the research status of affective computation domestically,and prospects for future developmenttrends are offered.We believe that scalable emotion feature design and large-scale model transfer learning based methodswill be the future directions of development.The main challenge in multimodal emotion recognition lies in data scarcity,which means that data available to build and explore complex models are insufficient.This insufficiency causes difficulty increating robust models based on deep neural network methods.The above mentioned issues can be addressed by construct⁃ing large-scale multimodal emotion databases and exploring transfer learning methods based on large models.By transfer⁃ring knowledge learned from unsupervised tasks or other tasks to emotion recognition tasks,the problem of limited dataresources can be alleviated.The use of explicit discrete and dimensional labels to represent ambiguous emotional states haslimitations due to the inherent fuzziness of emotions.Enhancing the interpretability of prediction results to improve the reli⁃ability of recognition results is also an important research direction for the future.The role of multimodal emotion comput⁃ing in addressing emotional disorders such as depression and anxiety is increasingly prominent.Future research can be con⁃ducted in the following three areas.First,research and construction of multimodal emotion disorder datasets can provide asolid foundation for the automatic recognition of emotional disorders.However,this field still needs to address challengessuch as data privacy and ethics.In addition,considerations such as designing targeted interview questions,ensuringpatient safety during data collection,and sample augmentation through algorithms are still worth exploring.Second,moreeffective algorithms should be developed.Emotional disorders fall within the psychological domain,and they can alsoaffect the physiological features of patients,such as voice and body movements.This psychological-physiological correla⁃tion is worthy of comprehensive exploration.Therefore,improving the accuracy of algorithms for multimodal emotion disor⁃der recognition is a pressing research issue.Finally,intelligent psychological intervention systems should be designed andimplemented.The following issues can be further studied:effectively simulating the counseling process of a psychologist,promptly receiving user emotional feedback,and generating empathetic conversations.
作者 陶建华 范存航 连政 吕钊 沈莹 梁山 Tao Jianhua;Fan Cunhang;Lian Zheng;Lyu Zhao;Shen Ying;Liang Shan(Department of Automation,Tsinghua University,Beijing 100084,China;Anhui Province Key Laboratory ofMultimodal Cognitive Computation,Anhui University,Hefei 230601,China;Institute of Automation,Chinese Academy ofSciences,Beijing 100190,China;School of Software Engineering,Tongji University,Shanghai 457001,China;School of Advanced Technology,Xi’an Jiaotong-Liverpool University,Suzhou 215123,China)
出处 《中国图象图形学报》 CSCD 北大核心 2024年第6期1607-1627,共21页 Journal of Image and Graphics
基金 国家自然科学基金项目(62201572,62201002,62101553,62306316)。
关键词 情感识别 多模态融合 人机交互 抑郁状态评估 情感障碍干预 认知行为疗法 sentiment recognition multimodel fusion human-computer interaction depression detection emotion disor⁃der intervention cognitive behavior therapy
  • 相关文献

参考文献3

二级参考文献19

  • 1林传鼎,无.社会主义心理学中的情绪问题——在中国社会心理学研究会成立大会上的报告(摘要)[J].社会心理科学,2006,21(1):37-37. 被引量:15
  • 2Tsou Benjamin K Y, Kwong O Y, Wong W L. Sentiment and content analysis of Chinese news coverage [ J ]. International Journal of Computer Processing of Oriental Languages, 2005, 18(2) : 171-183.
  • 3Ekman P. Facial expression and emotion [ J]. Americam Psychologist, 1993, 48:384-392.
  • 4Yu Zhang, zhuoming Li, Fuji Ren, Shingo Kuroiwa. Semiautomatic emotion recognition from textual input based on the constructed emotion thesaurus[ C]. Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE' 05). 2005 : 571-576.
  • 5许小颖,陶建华.汉语情感系统中情感划分的研究[C].第一届中国情感计算及智能交互学术会议论文集.2003:199-205.
  • 6Ekman P. An argument for basic emotions [ J]. Cognition and Emotion, 1992, 6: 169-200.
  • 7郑怀德,孟庆海.汉语形容词用法词典[M].北京:商务印书馆,2004.
  • 8Hugo Liu, Henry Lieberman, Ted Selker. A model of textual affect sensing using real-world knowledge [ C ] .Proceedings of the 8th International Conference on Intelligent User Interfaces. 2003: 125-132.
  • 9Hugo Liu, Ted Selker, Henry Lieberman. Visualizing the affective structure of a text document [ C ].Proceedings of Conference on Human Factors in Computing Systems. 2003 : 740-741.
  • 10Hua Wang, Helmut Prendinger, Takeo Igarashi. Communicating emotions in online chat using physiological sensors and animated text [ C ].Proceedings of Conference on Human Factors in Computing Systems. 2004: 1171- 1174.

共引文献398

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部