声源定位是声源信号处理中非常重要的研究目标。传统方法容易受到噪声和混响的干扰。随着深度学习算法在诸多领域的成功应用,本文探究了使用深度学习算法解决声源定位问题。本文对使用卷积神经网络结构实现基于麦克风信号的声源定位性...声源定位是声源信号处理中非常重要的研究目标。传统方法容易受到噪声和混响的干扰。随着深度学习算法在诸多领域的成功应用,本文探究了使用深度学习算法解决声源定位问题。本文对使用卷积神经网络结构实现基于麦克风信号的声源定位性能分析,并基于仿真实验探究相同房间条件和声源条件下,不同卷积层和卷积核数量对于声源定位性能的影响。实验表明,声音信号的基于相位变换加权的广义互相关特征作为卷积神经网络输入信号,在声音信噪比10 dB~40 dB,混响在200~600 ms的常规房间条件设定下,相比于其他方法其声源定位准确率最高,且卷积网络中包含6个卷积层,首层卷积层卷积核为4时其网络定位精度和计算效率之间取得了较好的平衡。Sound source localization is a crucial research objective in sound source signal processing. Traditional methods are prone to interference from noise and reverberation. With the successful application of deep learning algorithms in many fields, this paper explores the use of deep learning algorithms to solve the problem of sound source localization. This study analyzes the performance of sound source localization based on microphone signals using a convolutional neural network (CNN) structure. Through simulation experiments, we investigate the impact of different numbers of convolutional layers and convolutional kernels on sound source localization performance under the same room and sound source conditions. The experiments show that the sound signal after the generalized cross-correlation phase transform operation is used as the input signal of the convolutional neural network, undertypical room conditions with a signal-to-noise ratio of 10 dB~40 dB and reverberation times of 200~600 ms, this method achieves the highest localization accuracy compared to other methods. Furthermore, when the network contains 6 convolutional layers and the first layer has 4 convolutional kernels, a good balance between localization accuracy and computational efficiency is achieved.展开更多
文摘声源定位是声源信号处理中非常重要的研究目标。传统方法容易受到噪声和混响的干扰。随着深度学习算法在诸多领域的成功应用,本文探究了使用深度学习算法解决声源定位问题。本文对使用卷积神经网络结构实现基于麦克风信号的声源定位性能分析,并基于仿真实验探究相同房间条件和声源条件下,不同卷积层和卷积核数量对于声源定位性能的影响。实验表明,声音信号的基于相位变换加权的广义互相关特征作为卷积神经网络输入信号,在声音信噪比10 dB~40 dB,混响在200~600 ms的常规房间条件设定下,相比于其他方法其声源定位准确率最高,且卷积网络中包含6个卷积层,首层卷积层卷积核为4时其网络定位精度和计算效率之间取得了较好的平衡。Sound source localization is a crucial research objective in sound source signal processing. Traditional methods are prone to interference from noise and reverberation. With the successful application of deep learning algorithms in many fields, this paper explores the use of deep learning algorithms to solve the problem of sound source localization. This study analyzes the performance of sound source localization based on microphone signals using a convolutional neural network (CNN) structure. Through simulation experiments, we investigate the impact of different numbers of convolutional layers and convolutional kernels on sound source localization performance under the same room and sound source conditions. The experiments show that the sound signal after the generalized cross-correlation phase transform operation is used as the input signal of the convolutional neural network, undertypical room conditions with a signal-to-noise ratio of 10 dB~40 dB and reverberation times of 200~600 ms, this method achieves the highest localization accuracy compared to other methods. Furthermore, when the network contains 6 convolutional layers and the first layer has 4 convolutional kernels, a good balance between localization accuracy and computational efficiency is achieved.