摘要
近年来,生成对抗网络在图像风格迁移领域中表现优秀,然而其在音乐领域表现一般。现有的音乐风格迁移对带有人声的音乐的风格迁移效果不佳。为了解决这些问题,首先提取音乐的CQT特征和梅尔频谱特征,然后采用CycleGAN对CQT特征和梅尔频谱的联合特征做风格迁移,再通过WaveNet声码器来对迁移后的谱图进行解码,最终实现了带有人声的音乐的风格迁移。在公开数据集FMA上对所提模型进行评估,符合要求的音乐的平均风格迁移率达到了94.07%。与其他算法相比,该方法所产生的音乐的风格迁移率和音频质量都优于其他算法。
In recent years,the generative confrontation network has performed well in the field of image style transfer,but its performance in the field of music is average.The existing music style transfer has poor effect on the style transfer of music with human voice.In order to solve these problems,the CQT feature and Mel spectrum feature of the music are extracted,and then CycleGAN is used to transfer the style of the combined feature of CQT feature and Mel spectrum.Finally,the WaveNet vocoder is used to decode the migrated spectrum.Finally,we realize the style transfer of music with vocals.The proposed model is evaluated on the public data set FMA,and the average style transfer rate of music that meets the requirements reaches 94.07%.Compared with other algorithms,the style transfer rate and audio quality of the music produced by this method are better than other algorithms.
作者
叶洪良
朱皖宁
洪蕾
YE Hong-liang;ZHU Wan-ning;HONG Lei(School of Software Engineering,Jinling Institute of Technology,Nanjing 211100,China)
出处
《计算机科学》
CSCD
北大核心
2021年第S01期326-330,363,共6页
Computer Science
基金
金陵科技学院高层次人才科研启动基金(jit-b-201624)
江苏省大学生创新训练计划项目(202013573045Y)
江苏高校哲学社会科学基金项目(2019SJA0485)。
关键词
生成对抗网络
风格迁移
音乐处理
表征学习
Generative adversarial networks
Style transfer
Music processing
Representation learning