摘要
针对多种定位因素存在复杂关联且不易准确提取的问题,提出了以完整双耳声信号作为输入的、基于深度学习的双耳声源定位算法。首先,分别采用深层全连接后向传播神经网络(Deep Back Propagation Neural Network,D-BPNN)和卷积神经网络(Convolutional Neural Network,CNN)实现深度学习框架;然后,分别以水平面15°、30°和45°空间角度间隔的双耳声信号进行模型训练;最后,采用前后混乱率、定位准确率与训练时长等指标进行算法有效性分析。模型预测结果表明,CNN模型的前后混乱率远低于D-BPNN;D-BPNN模型的定位准确率能够达到87%以上,而CNN模型的定位准确率能够达到98%左右;在相同实验条件下,CNN模型的训练时长大于D-BPNN,且随着水平面角度间隔的减小,两者训练时长之间的差异愈发显著。
Due to existence of complicated relationships between multiple localization cues,which causes them hard to be extracted accurately,a deep learning-based binaural sound source localization algorithm with complete binaural sound signals as input is proposed.Firstly,the deep fully connected back propagation neural network(D-BPNN)and the convolutional neural network(CNN)are used to implement the deep learning framework respectively.And then,binaural sound source signals with uniform azimuthal spacing of 15°,30°and 45°in horizontal plane are applied to model training respectively.Finally,indicators such as front-back confusion rate,localization accuracy and training duration are used to investigate effectiveness of the models.The model prediction results show that the front-back confusion rate of the CNN model is much lower than that of D-BPNN model.The localization accuracy of the DBPNN model can reach more than 87%,while the localization accuracy of the CNN model is about 98%.Under the same experimental conditions,the training time of CNN model is longer than that of D-BPNN model;Moreover,this difference in training time becomes more and more obviously as the azimuthal spacing in the horizontal plane decreases.
作者
宋昊
刘雪洁
俞胜锋
钟小丽
SONG Hao;LIU Xuejie;YU Shengfeng;ZHONG Xiaoli(School of Management,Guangdong University of Technology,Guangzhou 510000,Guangdong,China;School of Physics and Telecommunication Engineering,South China Normal University,Guangzhou 510006,Guangdong,China;School of Physics and Optoelectronics,South China University of Technology,Guangzhou 510640,Guangdong,China)
出处
《声学技术》
CSCD
北大核心
2022年第4期602-607,共6页
Technical Acoustics
基金
广东省自然科学基金项目(2021A1515011871,2021A1515012630)。
关键词
双耳声源定位
深度学习
卷积神经网络
binaural localization algorithm
deep learning
convolutional neural network(CNN)