摘要
局部区域差异会导致环境声音难以精确分类。对此,提出了一种基于精简双线性注意力网络的环境声音分类方法。首先,引入多维时频特征充分表征环境声音的特点;其次,引入随机擦除在线数据增强的方法,避免缺乏数据集而导致所训练的模型出现过拟合问题,提高样本的多样性;最后,在精简双线性网络框架不变的情况下,采用密集型连接网络DensNet-169作为特征提取模块,并引入通道空间位置注意力模块,关注环境声音特征局部区域的差异。实验结果表明,所提方法在ESC-10和ESC-50数据集上的准确率均超过人耳识别的准确率。
Local regional differences can make it difficult to classify environmental sounds accurately.Therefore,an environmental sound classification based on compact bilinear attention network is proposed.Firstly,multi-dimensional time-frequency features are introduced to fully characterize the characteristics of environmental sound.Secondly,online random erasing data augmentation is introduced to avoid overfitting of the trained model due to lack of dataset and improve sample diversity.Finally,with the unchanged compact bilinear network framework,DensNet-169 is adopted as the feature extraction module,and the channel spatial location attention module is introduced to pay attention to the differences of local regions of environmental sound features.The results show that the accuracy of the proposed method on ESC-10 and ESC-50 datasets are better than human ear recognition accuracy.
作者
董绍江
夏蒸富
蔡巍巍
DONG Shaojiang;XIA Zhengfu;CAI Weiwei(School of Mechantronics and Vehicle Engineering,Chongqing Jiaotong University,Chongqing 400074,China;Continental Automotive Research and Development Chongqing Company Limited,Chongqing 400074,China)
出处
《北京邮电大学学报》
EI
CSCD
北大核心
2023年第6期102-107,共6页
Journal of Beijing University of Posts and Telecommunications
关键词
精简双线性网络
注意力模块
环境声音分类
随机擦除数据增强
多维时频特征
compact bilinear network
attention module
environmental sound classification
random erasing data augmentation
multi-dimensional time-frequency features