音乐多模态数据情感识别方法的研究

Research on Emotion Recognition Method of Music Multimodal Data

下载PDF

导出

摘要音乐情感识别研究在音乐智能推荐和音乐可视化等领域有着广阔的应用前景.针对该研究中存在的仅利用低层音频特征进行情感识别时效果有限且可解释性差的问题,首先,构建能够学习音符语义信息的基于乐器数字接口(MIDI)数据的情感识别模型ERMSLM(emotion recognition model based on skip-gram and LSTM using MIDI data),该模型的特征是由基于跳字模型(skip-gram)和长短期记忆(LSTM)网络提取的旋律特征,利用预训练的多层感知机(MLP)提取的调性特征以及手动构建的特征3部分连接而成;其次,构建融合歌词和社交标签的基于文本数据的情感识别模型ERMBT(emotion recognition model based on BERT using text data),其中歌词特征是由基于BERT(bidirectional encoder representations from trans formers)提取的情感特征、利用英文单词情感标准(ANEW)列表所构建的情感词典特征以及歌词的词频—逆文本频率(TF-IDF)特征所组成;最后,围绕MIDI和文本两种数据构建特征级融合和决策级融合两种多模态融合模型.实验结果表明,ERMSLM和ERMBT模型分别可达到56.93%,72.62%的准确率,决策级多模态融合模型效果更优. The research of music emotion recognition has broad application prospects in the fields of music intelligent recommendation and music visualization.Aiming at the problem that only using low‑level audio features for emotion recognition has limited effectiveness and poor interpretability.Firstly,an emotion recognition model ERMSLM based on MIDI(musical instrument digital interface)data is constructed,which can learn the semantic information of notes.The features of this model are composed of melodic features extracted with skip‑gram and LSTM(long short‑term memory),tonal features extracted by pre‑trained MLP and manually constructed features.Secondly,an emotion recognition model ERMBT based on text data that integrates lyrics and social tags is constructed.The lyrics features are composed of emotional features extracted with BERT,emotional dictionary features constructed by using ANEW lists and TF-IDF features of lyrics.Finally,two multimodal fusion models of feature‑level fusion and decision‑level fusion are constructed based on MIDI and text data.The experimental results show that the ERMSLM and ERMBT models can achieve accuracies of 56.93%and 72.62%respectively.And the decision‑level multimodal fusion model is more effective.

作者韩东红孔彦茹展艺萌刘源 HAN Dong-hong;KONG Yan-ru;ZHAN Yi-meng;LIU Yuan(School of Computer Science&Engineering,Northeastern University,Shenyang 110169,China;NARI Group Corporation,State Grid Electric Power Research Institute,Nanjing 211000,China)

机构地区东北大学计算机科学与工程学院国网电力科学研究院南瑞集团有限公司

出处《东北大学学报（自然科学版）》 EI CAS CSCD 北大核心 2024年第6期776-785,792,共11页 Journal of Northeastern University(Natural Science)

基金国家自然科学基金资助项目(61672144) 国家重点研发计划项目(2019YFB1405302)。

关键词音乐情感识别深度学习多模态长短期记忆 music emotion recognition deep learning multimodal LSTM

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1Donghong HAN,Yanru KONG,Jiayi HAN,Guoren WANG.A survey of music emotion recognition[J].Frontiers of Computer Science,2022,16(6):51-61. 被引量：2
2陈晓鸥,杨德顺.音乐情感识别研究进展[J].复旦学报（自然科学版）,2017,56(2):136-148. 被引量：15
3邓永莉,吕愿愿,刘明亮,崔宇佳,陆起涌.基于中高层特征的音乐情感识别模型[J].计算机工程与设计,2017,38(4):1029-1034. 被引量：13
4韩文静,李海峰,阮华斌,马琳.语音情感识别研究进展综述[J].软件学报,2014,25(1):37-50. 被引量：171

二级参考文献85

1孙守迁,王鑫,刘涛,汤永川.音乐情感的语言值计算模型研究[J].北京邮电大学学报,2006,29(z2):35-40. 被引量：9
2van Bezooijen R,Otto SA,Heenan TA. Recognition of vocal expressions of emotion:A three-nation study to identify universal characteristics[J].{H}JOURNAL OF CROSS-CULTURAL PSYCHOLOGY,1983,(04):387-406.
3Tolkmitt FJ,Scherer KR. Effect of experimentally induced stress on vocal parameters[J].Journal of Experimental Psychology Human Perception Performance,1986,(03):302-313.
4Cahn JE. The generation of affect in synthesized speech[J].Journal of the American Voice Input/Output Society,1990.1-19.
5Moriyama T,Ozawa S. Emotion recognition and synthesis system on speech[A].Florence:IEEE Computer Society,1999.840-844.
6Cowie R,Douglas-Cowie E,Savvidou S,McMahon E,Sawey M,Schro. Feeltrace:An instrument for recording perceived emotion in real time[A].Belfast:ISCA,2000.19-24.
7Grimm M,Kroschel K. Evaluation of natural emotions using self assessment manikins[A].Cancun,2005.381-385.
8Grimm M,Kroschel K,Narayanan S. Support vector regression for automatic recognition of spontaneous emotions in speech[A].IEEE Computer Society,2007.1085-1088.
9Eyben F,Wollmer M,Graves A,Schuller B Douglas-Cowie E Cowie R. On-Line emotion recognition in a 3-D activation-valencetime continuum using acoustic and linguistic cues[J].Journal on Multimodal User Interfaces,2010,(1-2):7-19.
10Giannakopoulos T,Pikrakis A,Theodoridis S. A dimensional approach to emotion recognition of speech from movies[A].Taibe:IEEE Computer Society,2009.65-68.

共引文献190

1郑成杰,郑之.基于最大分类器差异域适应方法的3维点云分类[J].信息与控制,2023,52(5):588-597. 被引量：1
2张会云.语音情感识别研究综述[J].信息通信,2019,0(11):58-60. 被引量：2
3李霞,卢官明,闫静杰,张正言.多模态维度情感预测综述[J].自动化学报,2018,44(12):2142-2159. 被引量：26
4李高玲,帖云,齐林.基于随机森林分类优化的多特征语音情感识别[J].微电子学与计算机,2019,36(1):70-73. 被引量：12
5张福泉.情感建模及情感识别技术研究[J].廊坊师范学院学报（自然科学版）,2014,14(5):23-26. 被引量：4
6孙凌云,何博伟,刘征,杨智渊.基于语义细胞的语音情感识别[J].浙江大学学报（工学版）,2015,49(6):1001-1008. 被引量：2
7余春艳,翁子林.音频情感感知与视频精彩片段提取[J].计算机辅助设计与图形学学报,2015,27(10):1890-1899. 被引量：4
8孙颖,姚慧,张雪英,张奇萍.基于混沌特性的情感语音特征提取[J].天津大学学报（自然科学与工程技术版）,2015,48(8):681-685. 被引量：12
9陶华伟,査诚,梁瑞宇,张昕然,赵力,王青云.面向语音情感识别的语谱图特征提取算法[J].东南大学学报（自然科学版）,2015,45(5):817-821. 被引量：17
10蒋海华,胡斌.基于PCA和SVM的普通话语音情感识别[J].计算机科学,2015,42(11):270-273. 被引量：10

1彭鑫裕,雷敏,赵潇捷,彭志嫣.基于机器学习的有机太阳能电池能级预测及分子设计[J].湖南工业大学学报,2024,38(5):33-39.
2程习明,李艳峰,代安国,郑亮,王克磊,苏世闻,李明.3种数理模型模拟塑料大棚番茄冠层蒸腾速率的比较研究[J].江西农业学报,2023,35(5):74-81.
3关柳愉,王春英.人工智能赋能自闭症儿童社交技能的干预路径与实践研究[J].师道（教研）,2024(7):110-111.
4方丛丛,金赟,赵力,马勇,李世党,顾煜.基于文本特征能量编码的多模态语声情感识别[J].应用声学,2024,43(5):997-1007.
5刘敏.小乐器在小学音乐教学中的应用实践探索[J].国家通用语言文字教学与研究,2024(6):191-193.
6李欣瑜,黄阳.论舒伯特钢琴奏鸣曲第一乐章中主部与连接部的三段性现象[J].艺术评鉴,2023(13):73-78. 被引量：2
7高静.基于BT模型的3D创意教学模式探索——以“理想流体的伯努利实验”为例[J].实验教学与仪器,2024,41(8):32-35.
8韦灵,卢光云,唐爱龙.基于混合神经网络的个性化自然语言情感识别系统[J].自动化与仪表,2024,39(9):26-28.
9徐良玉.“中国风”流行歌曲的歌词特征与演唱处理[J].乐府新声（沈阳音乐学院学报）,2024(2):163-168.
10李俏杰.几类适用于G.fast频段数字用户线路信道模型[J].长江信息通信,2024,37(1):124-128.

东北大学学报（自然科学版）

2024年第6期

浏览历史

内容加载中请稍等...

音乐多模态数据情感识别方法的研究

参考文献4

二级参考文献85

共引文献190

相关作者

相关机构

相关主题

浏览历史