摘要
为了对环境声音进行更好的识别和分类,提出了基于多级残差网络(Multilevel residual network,Mul-EnvResNet)的环境声音分类方法。对声音事件进行时标和基频压扩之后,提取其梅尔频率倒谱系数(Mel-frequency cepstral coefficients,MFCCs),以及它们的差分作为特征参数送入MulEnvResNet对声音事件进行分类。实验数据集采用ESC-50,将Mul-EnvResNet模型与端到端的卷积神经网络(EnvNet)、基于注意力机制的循环神经网络(Attention based convolutional recurrent neural network,ACRNN),以及受限卷积玻尔兹曼机的无监督滤波器组模型(Convolutional restricted Boltzmann machine,ConvRBM)进行对比实验。实验结果表明,Mul-EnvResNet取得了89.32%的最佳分类准确率,相较上述3种模型在分类准确率上分别有18.32%、3.22%、2.82%的提升,相较于其他的声音分类方法也均有明显的优势。
To better identify and classify environmental sound,a multilevel residual network(MulEnvResNet)is proposed for environmental sound classification.After time stretch and pitch shift for sound events,the Mel-frequency cepstral coefficients(MFCCs)and their deltas are extracted as feature parameters and sent into the Mul-EnvResNet to classify sound events.The experimental data set uses ESC-50,Mul-EnvResNet is compared with the end-to-end convolutional neural network(EnvNet),the attention based convolutional recurrent neural network(ACRNN)and the unsupervised filterbank learning using convolutional restricted Boltzmann machine(ConvRBM).The experimental results show that,MulEnvResNet achieves the best accuracy rate of 89.32%in terms of classification accuracy,compared with the above three models,the classification accuracy has been improved by 18.32%,3.22%and 2.82%,respectively,which also has obvious advantages compared with other sound classification methods.
作者
曾金芳
李友明
杨恢先
张钰
胡雅欣
ZENG Jinfang;LI Youming;YANG Huixian;ZHANG Yu;HU Yaxin(School of Physics and Optoelectronics,Xiang Tan University,Xiangtan 411105,China)
出处
《数据采集与处理》
CSCD
北大核心
2021年第5期960-968,共9页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(62071411)资助项目
湖南省自然科学基金(2018JJ3486)资助项目。
关键词
环境声音分类
多级残差网络
时标压扩
基频压扩
environmental sound classification
multilevel residual network
time stretch
baseband stretch