期刊文献+

基于高分辨率网络和自注意力机制的歌声分离算法 被引量:2

Singing voice separation algorithm based on high resolution network and self-attention mechanism
下载PDF
导出
摘要 针对现有歌声分离算法分离精度不高的问题,提出了一种基于高分辨率网络和自注意力机制的歌声分离算法。该算法构建了基于频域模型的深度神经网络,将高分辨率网络作为主干网络,以此保证分离精度,并在网络中融入自注意力机制来捕获歌曲中的重复旋律。在歌声分离算法中,首先通过短时傅里叶变换对音乐信号进行时频转换,得到幅值谱;其次通过构建的神经网络将歌曲幅值谱进行分离,得到人声和伴奏的幅值谱;最后结合原歌曲的相位谱,通过短时傅里叶逆变换得到人声和伴奏的时域信号。结果表明:在MUSDB18数据集上,分离得到的人声和伴奏信号偏差比指标分别为7.68 dB和12.85 dB,相比于基准模型分别提高了21.52%和1.26%。该算法可以增强神经网络特征表达能力,有效提升歌声分离效果。 To address the problem of low separation accuracy of the existing singing voice separation algorithms, a singing voice separation algorithm based on high-resolution network and self-attention mechanism was proposed, which constructed a deep neural network based on the frequency-domain model, used high-resolution network as the backbone network to ensure the separation accuracy, and integrated the self-attention mechanism into the network to capture the repeated melody in the song. The process of singing voice separation algorithm is as follows: Firstly, the short-time Fourier transform was used for the time-frequency transformation of music signal to get the amplitude spectrogram;second, the amplitude spectrum of song was separated by the established neural network to obtain the amplitude spectrogram of the singing voice and accompaniment;finally, the time domain signals of singing voice and accompaniment were obtained by short-time inverse Fourier transform according to the phase spectrogram of the original song. The experimental results show that: on the MUSDB18 dataset, the signal-to-deviation ratio index of singing voice and accompaniment is 7.68 db and 12.85 db respectively, an increase of 21.52% and 1.26% than the benchmark model, indicating that the algorithm proposed in this study can strengthen the feature expression ability of neural network, and effectively improve the effect of singing voice separation.
作者 倪欣 任佳 NI Xin;REN Jia(Faculty of Mechanical Engineering&Automation,Zhejiang Sci-Tech University,Hangzhou 310018)
出处 《浙江理工大学学报(自然科学版)》 2022年第3期405-412,共8页 Journal of Zhejiang Sci-Tech University(Natural Sciences)
基金 浙江省公益技术研究项目(LGG20F030007)。
关键词 歌声分离 高分辨率网络 自注意力机制 深度神经网络 频域模型 singing voice separation high-resolution network self-attention mechanism deep neural network frequency-domain model
  • 相关文献

参考文献3

二级参考文献22

  • 1陈桂华.严格区分节奏、节奏型、节拍、拍子的意义[J].乐府新声(沈阳音乐学院学报),1990,8(2):50-53. 被引量:1
  • 2李伟,袁一群,李晓强,薛向阳,陆佩忠.数字音频水印技术综述[J].通信学报,2005,26(2):100-111. 被引量:73
  • 3冯寅,周昌乐.算法作曲的研究进展[J].软件学报,2006,17(2):209-215. 被引量:34
  • 4Kim Get al. An algorithm that improves speech intelligi- bility in noise for normal-hearing listeners. J Acoust. Soc. Am., 2009; 126:1486--1494.
  • 5Wang D L, Brown G J. Separation of speech from interfer ing sounds based on oscillatory correlation. IEEE Trans Neural Netw., 1999; 10(3): 684--697.
  • 6Ozerov A, F6votte C. Multichannel nonnegative matrix fac- torization in convolutive mixtures for audio source separa- tion. IEEE Transactions on Audio, Speech, and Language Processing, 2010; 18(3): 550--563.
  • 7Ozerov A, Vincent E, Bimbot F. A general flexible frame: work for the handling of prior information in audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 2012; 20(4): 1118--1133.
  • 8Antoine Liutkus, Zafer Rafii, Roland Badeau, Bryan Pardo, Gael Richard. Adaptive filtering for music/voice separation exploiting the repeating musical structure. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012:53--56.
  • 9Zafer Rafii, Bryan Pardo. Online REPET-SIM for real- time speech enhancement. 38th International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013:848--852.
  • 10Rafii Z, Pardo B. Repeating pattern extraction technique (REPET): A simple method for music/voice separation. IEEE Transactions on Audio, Speech, and Language Pro- cessing, 2013; 21(1): 73--84.

共引文献47

同被引文献15

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部