摘要
针对传统乐器识别需要音乐的低级声频特征及识别性能依赖特征选取的问题,利用接近人耳感知且低冗余度的听觉谱图作为5层深度卷积网络的输入,逐层抽象出音色的高级时频表示用于乐器识别。为有效捕获听觉谱图中的时频信息,将卷积网络第1层矩形卷积核改进为频率、时间轴上的多尺度卷积核。在IOWA乐器库上进行的仿真实验结果表明,该神经网能获得96. 95%的识别准确率,优于使用单一卷积核的神经网,在相同的网络结构下,基于听觉谱图得到的识别准确率较基于梅尔频率倒谱系数(MFCC)、语谱图分别高出9. 11%、3. 54%,且对打击乐器与同族乐器的错分率均较小。
Aiming at the problem that traditional musical instrument identification depends on feature selection and elementary acoustical feature,a 5-layer Convolutional Neural Network(CNN)extracting high-level time-frequency information of timbre layer by layer is proposed,whose input is auditory spectrum containing harmonic information and close to human perception.The mono convolution kernel of first layer is improved by multi-scale kernel of time and frequency axises to effectively extract time-frequency information from auditory spectrum.Experimental results on IOWA database show that using the improved multi-scale convolution kernel can achieve 96.95%recognition accuracy,which is better than using a mono convolution kernel.Under the same network structure,the recognition accuracy obtained by using the auditory spectrum is 9.11%and 3.54%higher than the Mel-Frequency Cepstral Coefficient(MFCC)and spectrogram,respectively,and the misclassification rate of percussion instruments and kindred instruments are 2%and 3.1%,which are less than MFCC and spectrogram.
作者
王飞
于凤芹
WANG Fei;YU Fengqin(School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214100,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第1期199-205,共7页
Computer Engineering
基金
国家自然科学基金(61703185)
关键词
听觉谱图
卷积神经网络
卷积核
时频特征
乐器识别
auditory spectrum
Convolutional Neural Network(CNN)
convolution kernel
time-frequency feature
musical instrument identification