摘要
针对现有模型声音分类精度不足的问题,提出了一种基于多特征双二流网络的D-S融合模型。首先,提出了四种组合特征来更全面有效地表征声音。其次,提出双二流网络结构来更好地训练模型。第一和二流网络采用多分辨率多通道特征送入二阶密集卷积网络(2-DenseNet),其中2-DenseNet被分成了两个密集块。第三和四流网络采用单分辨率单通道的特征拼接送入四层CNN。然后利用D-S证据理论对softmax层的输出结果进行融合,得到D-S-Net模型。实验结果表明,基于UrbanSound8k数据集,经数据增强后该模型的准确率达96.36%,较基线提高了25.34%,并验证了在噪声环境下的鲁棒性,在20 dB信噪比下具有90.34%的识别率,在低信噪比下的性能得到了很好的提升。
In order to solve the problem of insufficient accuracy of sound classification,this paper proposed a Dempster-Shafer(D-S)fusion model based on multi-feature double two stream network.Firstly,this paper proposed four combined features to represent sound more comprehensively and effectively.Secondly,this paper proposed a better training model based on double two stream network architecture.By using multi-resolution and multi-channel features,the first and second stream network feed into second-order dense convolution network(2-DenseNet),in which 2-DenseNet divided into two dense blocks.By using the feature splicing of single resolution and single channel,the third and fourth stream networks fed into the four-layer CNN network.Then it fused output results of softmax based on D-S evidence theory to obtain the D-S-Net model.The experimental results show that based on the UrbanSound8 k data set,the accuracy of the model is 96.34%after data enhancement,which is 25.34%higher than the baseline,which verifies the robustness in noise environment.It has a recognition rate of 90.34%at 20 dB signal to noise ratio(SNR),the performance is greatly improved at low SNR.
作者
吴佳赛
高振斌
Wu Jiasai;Gao Zhenbin(School of Electronic Information Engineering,Hebei University of Technology,Tianjin 300401,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第3期693-698,703,共7页
Application Research of Computers
关键词
声音分类
特征融合
密集卷积网络
D-S融合
双二流网络
sound classification
feature fusion
dense convolution network
D-S fusion
double two stream network