Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transf...Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transform(SRP-PHAT)spatial spectrum as input feature is presented in this paper.Since the SRP-PHAT spatial power spectrum contains spatial location information,it is adopted as the input feature for sound source localization.DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features.SRP-PHAT at each steering position within a frame is arranged into a vector,which is treated as DNN input.A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals.The azimuth of sound source is estimated through trained DNN model from the testing signals.Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not,and is more robust to noise and reverberation.展开更多
头佩式麦克风阵列在单兵便携反狙击声探测定位系统和机器人声定位系统中具有实际的应用价值。一般的声源定位方法是基于无遮挡的线性或非线性麦克风阵列。采用头佩式麦克风阵列,考虑到背向声源麦克风的低频声波由于头盔遮挡而发生的衍...头佩式麦克风阵列在单兵便携反狙击声探测定位系统和机器人声定位系统中具有实际的应用价值。一般的声源定位方法是基于无遮挡的线性或非线性麦克风阵列。采用头佩式麦克风阵列,考虑到背向声源麦克风的低频声波由于头盔遮挡而发生的衍射作用,针对低频波段的声音信号进行定位算法的设计和研究。该算法利用低频声波的绕射路径计算时延,采用联合可控功率响应(SRP-PHAT)框架进行时延补偿搜索定位。实验表明,相比于普通的无遮挡定位算法,基于绕射路径的头佩式麦克风阵列定位方法通过综合利用背向声源的麦克风数据,明显地提高了定位的精度,这种精度的提升在选择1 k Hz以内的信号频率窗口时达到最佳效果。展开更多
Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improv...Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate.展开更多
基金This work is supported by the National Nature Science Foundation of China(NSFC)under Grant No.61571106Jiangsu Natural Science Foundation under Grant No.BK20170757the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No.17KJD510002.
文摘Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transform(SRP-PHAT)spatial spectrum as input feature is presented in this paper.Since the SRP-PHAT spatial power spectrum contains spatial location information,it is adopted as the input feature for sound source localization.DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features.SRP-PHAT at each steering position within a frame is arranged into a vector,which is treated as DNN input.A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals.The azimuth of sound source is estimated through trained DNN model from the testing signals.Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not,and is more robust to noise and reverberation.
文摘头佩式麦克风阵列在单兵便携反狙击声探测定位系统和机器人声定位系统中具有实际的应用价值。一般的声源定位方法是基于无遮挡的线性或非线性麦克风阵列。采用头佩式麦克风阵列,考虑到背向声源麦克风的低频声波由于头盔遮挡而发生的衍射作用,针对低频波段的声音信号进行定位算法的设计和研究。该算法利用低频声波的绕射路径计算时延,采用联合可控功率响应(SRP-PHAT)框架进行时延补偿搜索定位。实验表明,相比于普通的无遮挡定位算法,基于绕射路径的头佩式麦克风阵列定位方法通过综合利用背向声源的麦克风数据,明显地提高了定位的精度,这种精度的提升在选择1 k Hz以内的信号频率窗口时达到最佳效果。
基金This work is supported by Nanjing Institute of Technology(NIT)fund for Research Startup Projects of Introduced talents under Grant No.YKJ202019Nature Sci-ence Research Project of Higher Education Institutions in Jiangsu Province under Grant No.21KJB510018+1 种基金National Nature Science Foundation of China(NSFC)under Grant No.62001215NIT fund for Doctoral Research Projects under Grant No.ZKJ2020003.
文摘Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate.