摘要
音乐情感识别的难题是缺乏足够的标签数据或者只有类别不均衡的标签数据训练情感识别模型,准确地标注情感类别不仅成本高而且耗时,且对标注者有着较高的音乐背景要求;同时,音乐的情感往往受多种因素的影响,演唱方式、音乐风格、编曲方式、歌词等因素都会影响到音乐情感的传达.本文提出一种基于知识蒸馏与音乐曲风迁移学习结合的多模态方法,在20000首歌曲上验证了该方法的有效性.实验证明,与单一音频、单一歌词及单一音频与歌词多模态方法相比,该方法的情感识别准确率均有明显的提高,且泛化能力得到提升.
The difficulty of music emotion recognition is the lack of enough label data or only the unbalanced label data to train emotion recognition model,and it is not only costly but also time-consuming to accurately label emotion categories,which requires knowledge background of music theory.At the same time,the emotion of music is often affected by many factors,which can be expressed through singing style,music arrangement,lyrics and other angles.This paper presents a multimodal method based on the combination of knowledge distillation and music style transfer learning,which is proved to be effective on 20000 songs.Experimental results show that compared with single audio,single lyrics and single audio,and lyrics multimodal method,the accuracy of emotion recognition of this method is significantly improved and the generalization ability is improved.
作者
赵剑
刘华平
梁晓晶
高月洁
ZHAO Jian;LIU Huaping;LIANG Xiaojing;GAO Yuejie(Laboratory of Audio and Video, Hangzhou Netease Cloud Music Technology Co. Ltd., Shanghai 200080, China)
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2021年第3期309-314,322,共7页
Journal of Fudan University:Natural Science
关键词
知识蒸馏
迁移学习
多模态音乐情感
深度学习
knowledge distillation
transfer learning
multimodal music emotion
deep learning