摘要
针对传统单输入模型在环境声音分类中准确率不高的问题,提出一种基于时域特征和频域特征并联型特征融合神经网络。在该网络中,首先通过数据增强的方法来处理原始音频;其次处理后的原始音频数据和梅尔(Mel)频谱特征数据分别送入原始波形网络和Mel频谱网络,得到其时域和频谱特征后,进行特征融合;最后,将特征融合后的结果送入SoftMax分类器进行分类。本文在UrbanSound8K数据集上进行了实验验证,最终分类准确率高达96.03%,优于其他模型。
Aiming at the problem of low accuracy of traditional single input model in environmental sound classification,a parallel feature fusion neural network based on time domain features and frequency domain features is proposed.In this network,firstly,the original audio is processed by data enhancement method;and then,the processed original audio data and Mel spectrum feature data are sent to the original waveform network and Mel spectrum network,respectively,after obtaining the time domain and spectrum domain features,the feature fusion is performed.Finally,the result is sent to SoftMax classifier for classification after feature fusion.Experimental verification is carried out on UrbanSound8K dataset,and the final classification accuracy is up to 96.03%,which is prior to other models.
作者
覃镜涛
高瑜翔
QIN Jingtao;GAO Yuxiang(College of Communication Engineering,Chengdu University of Information Technology,Chengdu 610225,China;Key Laboratory of Meteorological Information and Signal Processing in Universities of Sichuan Province,Chengdu 610225,China)
出处
《传感器与微系统》
CSCD
北大核心
2024年第7期106-109,113,共5页
Transducer and Microsystem Technologies
基金
四川省教育厅高校创新团队项目(15TD0022)。
关键词
并联型神经网络
特征融合
环境声音分类
parallel neural network
feature fusion
environmental sound classification