摘要
提出一种卷积循环神经网络联合估计声事件的类型和方向。使用DCASE2020提供的数据集,声事件保持静止或以缓慢的速度在四面体阵列周围移动。提取4个通道的梅尔谱和通道之间的广义互相关谱作为输入的特征图,输出每一帧中声事件的类型和到达角。这种神经网络取得了比两种基线方法更好的识别和定向效果。
This paper proposes a convolutional recurrent neural network to jointly estimate the categories and directions of sound events.Applying the dataset provided by DCASE2020,sound events keep stationary or move at a low speed around the tetrahedral array.The method extracts the Mel spectra of the four channels and the generalized cross-correlation spectra between any two channels as the input feature map,and outputs the categories and directions of sound events in every frame.This neural network achieves better performance of recognition and orienting than two baseline methods.
作者
黄山
王志峰
王显云
李大朋
HUANG Shan;WANG Zhifeng;WANG Xianyun;LI Dapeng(The Third Research Institute of China Electronics Technology Group Corporation,Beijing 100015,China)
出处
《电声技术》
2021年第6期25-29,共5页
Audio Engineering
关键词
深度神经网络
声事件识别
声事件定向
deep neural network
sound event recognition
sound event orientation