期刊文献+

基于注意力模型的卷积循环神经网络城市声音识别 被引量:4

Urban Sound Classification Using Convolutional Recurrent Neural Networks with Attention Model
下载PDF
导出
摘要 环境声音识别(environment sound recognition,ESR)在基于情景感知和辅助技术等领域发挥着重要作用。卷积神经网络(CNN)和循环神经网络(RNN)作为两种最具代表性的特征提取方法,在语音和音乐信号处理方面都取得显著效果;然而二者都存在一定缺点,CNN无法有效提取时间特征,RNN在提取空间特征上也存在明显劣势。为了有效提取并利用时间特征和空间特征,提出一种新模型,利用时间分布CNN从梅尔频谱图中提取城市环境声音特征,然后应用双向长短时记忆网络(BiLSTM)从CNN输出中获取时间信息,最后在输出序列上实施注意力机制,从而关注到与城市环境声音最相关的特征进而做出分类判断,注意力机制既提高了分类准确性,又增强了模型的可解释性。实验结果表明:在Urbansound8K数据集中,该模型可获得80.2%的分类准确率,这优于以往在同一数据集的报告结果。 Environment sound recognition(ESR)is widely applied in the fields of context-based awareness and assistive technologies.Convolutional neural network(CNN)and recurrent neural network(RNN)are the most effective feature extraction methods,which have achieved remarkable results in speech and music signal processing.However,CNN is not effective enough to process time-related features,and RNN has a disadvantage in extracting spatial features.To effectively extract and use temporal and spatial features,a novel model(CNN+BiLSTM+attention-mechanism)was proposed to overcome the above shortcomings.In this model,CNN was adopted to learn significant features from Mel spectral information,and then bi-directional long and short-term memory(BiLSTM)was used to obtain the time information from the CNN output,and finally,an attention-mechanism was implemented on the output sequence of the BiLSTM to focus on the target characteristics of the ambient sound.The experimental result is proved to obtain an average accuracy of 80.2%,which is superior to the other state-of-the-art classification methods in the Urbandsound8K dataset.
作者 杨磊 赵红东 YANG Lei;ZHAO Hong-dong(School of Electronic and Information Engineering,Hebei University ofTechnology,Tianjin 300300,China)
出处 《科学技术与工程》 北大核心 2020年第33期13757-13761,共5页 Science Technology and Engineering
基金 光电信息控制和安全技术重点实验室基金(614210701041705)。
关键词 卷积神经网络 双向长短时记忆网络 注意力机制 convolutional neural network bi-directional long and short-term memory attention-mechanism
  • 相关文献

参考文献2

二级参考文献14

  • 1Choi W-H, Kim S-I, Keum M-S, et al. Acoustic and visu- al signal based context awareness system for mobile appli- cation[ J ]. IEEE Transactions on Consumer Electronics, 2011,57(2) :738-746.
  • 2Ma Ling, Milner B, Smith D. Acoustic environment classi- fication[ J ]. ACM Transactions on Speech and Language Processing, 2006,3 (2).
  • 3Wichern G, Xue Jiachen, Thornburg H, et al. Segmenta- tion, indexing, and retrieval for environmental and natural sounds [ J ]. IEEE Transactions on Audio, Speech, and Language Processing, 2010,18 (3) :688-707.
  • 4Mohanapriya S P, Sumesh E P, Karthika R. Environmen- tal sound recognition using Gaussian mixture model and neural network classifier[ C ]//Proceedings of the 2014 In- ternational Conference on Green Computing Communication and Electrical Engineering (ICGCCEE). 2014.
  • 5Giannoulis D, Benetos E, Stowel D, et al. Detection and classification of acoustic scenes and events: An IEEE AASP challenge [ C ]// Proceedings of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 2013.
  • 6Tsukakoshi K, Ida K. Analysis of GMM by a Gaussian wavelet transform [ J ]. Procedia Computer Science, 2012, 8:467-472.
  • 7Liaw Y-C, Wu C-M, Leou M-L. Fast k-nearest neighbors search using modified principal axis search tree [ J ]. Digit- al Signal Processing, 2010,20 (5) : 1494-1501.
  • 8Chang C-C, Lin C-J. LIBSVM : A library for support vector machines[ J ]. ACM Transactions on Intelligent Systems and Technology, 2011,2 (3) : Article 27.
  • 9魏丹芳,李应.基于MFCC和加权动态特征组合的环境音分类[J].计算机与数字工程,2010,38(2):7-10. 被引量:4
  • 10肖勇,覃爱娜.改进的HMM和小波神经网络的抗噪语音识别[J].计算机工程与应用,2010,46(22):162-164. 被引量:9

共引文献78

同被引文献9

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部