期刊文献+

基于特征融合的声乐分类研究 被引量:7

Vocal Music Classification Based on Multi-category Feature Fusion
原文传递
导出
摘要 【目的】针对音乐信息检索中的声乐分类问题,将音频的统计特征和图像特征进行融合,探索效果更好的分类模型。【方法】抽取音频信息的统计特征以及梅尔频谱图图像特征。将机器学习方法用于统计特征,并设计了一种多层卷积神经网络架构用于图像特征,将声乐分类问题转化为图像分类问题,最后提出一种融合统计特征和图像特征的深度学习方法。【结果】在声乐分类任务上,基于图像特征的深度学习方法比机器学习方法 F1值提高约6个百分点,基于特征融合的深度学习模型F1值可达到69%以上,超过基于图像特征的深度学习模型3.4个百分点。【局限】实验数据量较小,未能完全发挥深度学习方法的优势。【结论】梅尔频谱图采样参数的设置对深度模型实验结果有较大影响,本文提出的特征融合方法可以有效提升声乐分类性能。 [Objective] This paper creates a new model combining the statistical characteristics of audio and image properties, aiming to address the classification issues facing music retrieval. [Methods] First, we extracted the statistical characteristics of audios and the Mel spectrogram characteristics of images with the help of machine learning methods. Then, we transformed the audio classification tasks to image categorization. Finally, we constructed a deep learning method combining audio statistics and Mel spectrogram image features. [Results] In vocal music classification, the F1 value of the new method based on image features was about 6 percentage points higher than that of the classic machine learning methods. The F1 value of the deep learning model based on feature fusion was more than 69%, which is 3.4 percentage points higher than that of the model with image features. [Limitations] The size of experimental data is small, and the advantages of deep learning methods were not fully utilized. [Conclusions] The setting of the sampling parameters of the Mel spectrogram influences the experimental results. The new feature fusion method can effectively improve the performance of vocal music classification.
作者 孟镇 王昊 虞为 邓三鸿 张宝隆 Meng Zhen;Wang Hao;Yu Wei;Deng Sanhong;Zhang Baolong(School of Information Management,Nanjing University,Nanjing 210023,China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2021年第5期59-70,共12页 Data Analysis and Knowledge Discovery
基金 国家社会科学基金重大招标项目(项目编号:17ZDA291)的研究成果之一。
关键词 声乐分类 卷积神经网络 特征融合 音乐信息检索 梅尔频谱图 Vocal Music Classification CNN Feature Fusion Music Information Retrieval Mel-Frequency Cepstrum
  • 相关文献

参考文献5

二级参考文献87

  • 1关欣,何友,衣晓.基于灰关联分析的雷达辐射源识别方法研究[J].系统仿真学报,2004,16(11):2601-2603. 被引量:33
  • 2陈东明,常桂然.基于分段编码自动生成产品结构树的研究[J].计算机集成制造系统,2005,11(7):1014-1018. 被引量:5
  • 3李国锋.低信噪比下的语音增强处理[J].应用声学,1995,14(5):13-16. 被引量:2
  • 4Y.F.Gong.Speech recognition in noisy environments:A survey[J].Speech Communication,1995,16:261-291.
  • 5S.Boll.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1979,27(2):113-120.In:Proceedings of IEEE International Conference on Acoustics,Acoustics and Signal Processing.
  • 6K.Paliwal and A.Basu.A speech enhancement method based on Kalman filtering[C]//Proceedings of 1987 IEEE International Conference on Acoustics,Acoustics and Signal Processing.Dallas,Texas,USA,1987:177-180.
  • 7Y.Ephraim and H.L.Van Trees.A signal subspace approach for speech enhancement[C]//Proceedings of 1993 IEEE International Conference on Acoustics,Acoustics and Signal Processing.Minneapolis,MN,USA,1993:355-358.
  • 8H.Lev-Ari,Y.Ephraim.Extension of the signal subspace speech enhancement approach to colored noise[J].IEEE Signal Processing Letters,2003,10(4):104-106.
  • 9S.Furui.Cepstral analysis technique for automatic speaker verification[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1981,29(2):254-272.
  • 10O.Viikki and K.Laurila.Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition[J].Speech Communication,1998,25:133-147.

共引文献1695

同被引文献67

引证文献7

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部