期刊文献+

基于SincNet增强的时延估计声源定位算法研究 被引量:1

Research on Time Delay Estimation Based Sound Source Localization Algorithm Enhanced by SincNet
原文传递
导出
摘要 针对相位变换加权的广义互相关时延估计(GCC-PHAT)声源定位方法在低信噪比与高混响条件下定位精度较低的问题,提出一种基于SincNet神经网络和GCC-PHAT协同工作的室内声源定位算法。以LibriSpeech语音数据集作为声源输入,采用Sinc函数作为滤波器构建SincNet主干网络结构,能够有效提取声源语音特征;将特征输入到GCC-PHAT模块进行相关性分析与特征降维;再通过多层感知网络(MLP)进一步提取高级特征,输出时延误差分类结果。实验结果表明,相对于SCOT/PHAT联合加权、卷积神经网络(CNN)、深层全连接后向传播神经网络(D-BPNN)等先进的声源定位算法,该算法具备更强的抗混响性能,且在不同信噪比和混响强度下,该算法的定位精度显著高于GCC-PHAT,SincNet提取的特征能有效增强时延估计的鲁棒性。 A indoor sound source localization algorithm based on the cooperation of SincNet neural network and GCC-PHAT is proposed to address the problem of low positioning accuracy of the phase transformation weighted generalized cross correlation time delay estimation(GCC-PHAT)sound source localization method under low signal-to-noise ratio and high reverberation conditions.Using the LibriSpeech speech dataset as the sound source input and using the Sinc function as a filter to construct the SincNet backbone network structure can effectively extract the speech features of the sound source;Input features into the GCC-PHAT module for correlation analysis and feature dimensionality reduction;Then,advanced features are further extracted through a multi-layer perception network(MLP),and the classification results of delay errors are output.The experimental results show that compared to advanced sound source localization algorithms such as SCOT/PHAT joint weighting,convolutional neural network(CNN),deep fully connected backpropagation neural network(D-BPNN),this algorithm has stronger anti reverberation performance,and its localization accuracy is significantly higher than GCC-PHAT under different signal-to-noise ratios and reverberation intensities.The features extracted by SincNet can effectively enhance the robustness of time delay estimation.
作者 卢炽华 薛齐凡 刘志恩 朱亚伟 彭文杰 李放 LU Chi-hua;UE Qi-fan;LIU Zhi-en;ZHU Ya-wei;PENG Wen-jie;LI Fang(School of Automotive Engineering,Wuhan University of Technology,Wuhan 430070,China;Hubei Provincial Key Laboratory of Modern Auto Parts Technology,Wuhan University of Technology,Wuhan 430070,China)
出处 《武汉理工大学学报》 CAS 2023年第10期127-134,共8页 Journal of Wuhan University of Technology
基金 国家自然科学基金(52175111) 湖北省重点研发计划(2021BAA177)。
关键词 声源定位 SincNet神经网络 GCC-PHAT 时延估计 多层感知网络 sound source localization SincNet neural network GCC-PHAT time delay estimation MLP
  • 相关文献

参考文献6

二级参考文献30

共引文献36

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部