音乐情感识别研究在音乐智能推荐和音乐可视化等领域有着广阔的应用前景.针对该研究中存在的仅利用低层音频特征进行情感识别时效果有限且可解释性差的问题,首先,构建能够学习音符语义信息的基于乐器数字接口(MIDI)数据的情感识别模型ER...音乐情感识别研究在音乐智能推荐和音乐可视化等领域有着广阔的应用前景.针对该研究中存在的仅利用低层音频特征进行情感识别时效果有限且可解释性差的问题,首先,构建能够学习音符语义信息的基于乐器数字接口(MIDI)数据的情感识别模型ERMSLM(emotion recognition model based on skip-gram and LSTM using MIDI data),该模型的特征是由基于跳字模型(skip-gram)和长短期记忆(LSTM)网络提取的旋律特征,利用预训练的多层感知机(MLP)提取的调性特征以及手动构建的特征3部分连接而成;其次,构建融合歌词和社交标签的基于文本数据的情感识别模型ERMBT(emotion recognition model based on BERT using text data),其中歌词特征是由基于BERT(bidirectional encoder representations from trans formers)提取的情感特征、利用英文单词情感标准(ANEW)列表所构建的情感词典特征以及歌词的词频—逆文本频率(TF-IDF)特征所组成;最后,围绕MIDI和文本两种数据构建特征级融合和决策级融合两种多模态融合模型.实验结果表明,ERMSLM和ERMBT模型分别可达到56.93%,72.62%的准确率,决策级多模态融合模型效果更优.展开更多
Music can trigger human emotion.This is a psychophysiological process.Therefore,using psychophysiological characteristics could be a way to understand individual music emotional experience.In this study,we explore a n...Music can trigger human emotion.This is a psychophysiological process.Therefore,using psychophysiological characteristics could be a way to understand individual music emotional experience.In this study,we explore a new method of personal music emotion recognition based on human physiological characteristics.First,we build up a database of features based on emotions related to music and a database based on physiological signals derived from music listening including EDA,PPG,SKT,RSP,and PD variation information.Then linear regression,ridge regression,support vector machines with three different kernels,decision trees,k-nearest neighbors,multi-layer perceptron,and Nu support vector regression(NuSVR)are used to recognize music emotions via a data synthesis of music features and human physiological features.NuSVR outperforms the other methods.The correlation coefficient values are 0.7347 for arousal and 0.7902 for valence,while the mean squared errors are 0.023 23 for arousal and0.014 85 for valence.Finally,we compare the different data sets and find that the data set with all the features(music features and all physiological features)has the best performance in modeling.The correlation coefficient values are 0.6499 for arousal and 0.7735 for valence,while the mean squared errors are 0.029 32 for arousal and0.015 76 for valence.We provide an effective way to recognize personal music emotional experience,and the study can be applied to personalized music recommendation.展开更多
文摘音乐情感识别研究在音乐智能推荐和音乐可视化等领域有着广阔的应用前景.针对该研究中存在的仅利用低层音频特征进行情感识别时效果有限且可解释性差的问题,首先,构建能够学习音符语义信息的基于乐器数字接口(MIDI)数据的情感识别模型ERMSLM(emotion recognition model based on skip-gram and LSTM using MIDI data),该模型的特征是由基于跳字模型(skip-gram)和长短期记忆(LSTM)网络提取的旋律特征,利用预训练的多层感知机(MLP)提取的调性特征以及手动构建的特征3部分连接而成;其次,构建融合歌词和社交标签的基于文本数据的情感识别模型ERMBT(emotion recognition model based on BERT using text data),其中歌词特征是由基于BERT(bidirectional encoder representations from trans formers)提取的情感特征、利用英文单词情感标准(ANEW)列表所构建的情感词典特征以及歌词的词频—逆文本频率(TF-IDF)特征所组成;最后,围绕MIDI和文本两种数据构建特征级融合和决策级融合两种多模态融合模型.实验结果表明,ERMSLM和ERMBT模型分别可达到56.93%,72.62%的准确率,决策级多模态融合模型效果更优.
基金Project supported by the Philosophy and Social Science Planning Fund Project of Zhejiang Province,China(No.20NDQN297YB)the National Natural Science Foundation of China(No.61702454)
文摘Music can trigger human emotion.This is a psychophysiological process.Therefore,using psychophysiological characteristics could be a way to understand individual music emotional experience.In this study,we explore a new method of personal music emotion recognition based on human physiological characteristics.First,we build up a database of features based on emotions related to music and a database based on physiological signals derived from music listening including EDA,PPG,SKT,RSP,and PD variation information.Then linear regression,ridge regression,support vector machines with three different kernels,decision trees,k-nearest neighbors,multi-layer perceptron,and Nu support vector regression(NuSVR)are used to recognize music emotions via a data synthesis of music features and human physiological features.NuSVR outperforms the other methods.The correlation coefficient values are 0.7347 for arousal and 0.7902 for valence,while the mean squared errors are 0.023 23 for arousal and0.014 85 for valence.Finally,we compare the different data sets and find that the data set with all the features(music features and all physiological features)has the best performance in modeling.The correlation coefficient values are 0.6499 for arousal and 0.7735 for valence,while the mean squared errors are 0.029 32 for arousal and0.015 76 for valence.We provide an effective way to recognize personal music emotional experience,and the study can be applied to personalized music recommendation.