Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,...Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.展开更多
Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transf...Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transform(SRP-PHAT)spatial spectrum as input feature is presented in this paper.Since the SRP-PHAT spatial power spectrum contains spatial location information,it is adopted as the input feature for sound source localization.DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features.SRP-PHAT at each steering position within a frame is arranged into a vector,which is treated as DNN input.A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals.The azimuth of sound source is estimated through trained DNN model from the testing signals.Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not,and is more robust to noise and reverberation.展开更多
The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP se...The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP search method is proposed to reduce the computational complexity using small-aperture microphone array.The proposed method inspired by the SRP spatial spectrum includes two steps:first,the proposed method estimates the azimuth of the sound source roughly and determines whether the sound source is in far field or near field;then,different fine searching operations are performed according to the sound source being in far field or near field.Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computation complexity of the proposed method with those of the conventional SRP-PHAT algorithm.The results show that,the proposed method has a comparative accuracy with the conventional SRP algorithm,and achieves a reduction of 93.62%in computation complexity compared to the conventional SRP algorithm.展开更多
Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cann...Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cannot achieve satisfactory performance in adverse noisy and reverberant environments.In order to improve localization performance,a novel SSL algorithm using convolutional residual network(CRN)is proposed in this paper.The spatial features including time difference of arrivals(TDOAs)between microphone pairs and steered response power-phase transform(SRPPHAT)spatial spectrum are extracted in each Gammatone sub-band.The spatial features of different sub-bands with a frame are combine into a feature matrix as the input of CRN.The proposed algorithm employ CRN to fuse the spatial features.Since the CRN introduces the residual structure on the basis of the convolutional network,it reduce the difficulty of training procedure and accelerate the convergence of the model.A CRN model is learned from the training data in various reverberation and noise environments to establish the mapping regularity between the input feature and the sound azimuth.Through simulation verification,compared with the methods using traditional deep neural network,the proposed algorithm can achieve a better localization performance in SSL task,and provide better generalization capacity to untrained noise and reverberation.展开更多
The letter proposed a sound source localization method of digital hearing aids using wavelet based multivariate statistics with the Generalized Cross Correlation (GCC) algorithm. Haar wavelet is used to decompose GCC ...The letter proposed a sound source localization method of digital hearing aids using wavelet based multivariate statistics with the Generalized Cross Correlation (GCC) algorithm. Haar wavelet is used to decompose GCC sequences and extract four wavelet characteristics. And then, Hotelling T2 statistical method is used to fuse the four wavelet characteristics. The statistical value is used to judge the number of sound sources and obtain corresponding time delay estimation which is used to localize the position of sound source. The experimental results show that the proposed method has better robustness in an environment with severe noise and reverberation. Meanwhile, the complexity of al-gorithm is moderate, which is available for sound source localization of hearing aids.展开更多
The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a ...The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a microphone array, each location corresponds to a set of time differences of arrival (TDOAs), and this paper collects them into a TDOA vector. Since the TDOA vectors in the adjacent regions are similar, we present a fast algorithm based on clustering search to reduce the computation complexity of SRP-PHAT. In the training stage, the K-means or Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm is used to find the centroid in each cluster with similar TDOA vectors. In the procedure of sound localization, the optimal cluster is found by comparing the steered response powers (SRPs) of all centroids. The SRPs of all candidate locations in the optimal cluster are compared to localize the sound source. Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computational load of the proposed method with those of the conventional SRP-PHAT algorithm. The results show that the proposed method is able to reduce the computational load drastically and maintains almost the same localization accuracy and robustness as those of the conventional SRP-PHAT algorithm. The difference in localization performance brought by different clustering algorithms used in the training stage is trivial.展开更多
In this paper the method of approximate expansion is used to analyse a perfect planar surround sound system, resulting in an order of new and upgrade systems. First reproductinn signals of the perfect system and the c...In this paper the method of approximate expansion is used to analyse a perfect planar surround sound system, resulting in an order of new and upgrade systems. First reproductinn signals of the perfect system and the characteristics of different orders systems are analysed. The independent transmission signals and decoding (reproduction) equation of the systexns are given. The compatibility among different orders systems and the problem of simplifying output channels are discussed. The problem of signal picking up, recording,transmitting and the possibility of putting the systems into practical use are studied. A sound hoage localization experiment for the systems is carried out in order to study haage localization in relaion to the numbers of transmission signals and output channels. The experimental result is consistemt with the theoretical result. This work lay down a base for practical use.展开更多
A new sound source localization method with sound speed compensation is proposed to reduce the wind influence on the performance of conventional TDOA (Time Difference of Arrival) algorithms. First, the sound speed i...A new sound source localization method with sound speed compensation is proposed to reduce the wind influence on the performance of conventional TDOA (Time Difference of Arrival) algorithms. First, the sound speed is described as a set of functions of the unknown source location, to approximate the acoustic velocity field distribution in the wind field. Then, they are introduced into the TDOA algorithm, to construct nonlinear equations. Finally, the particle swarm optimization algorithm is used to estimate the source location. The simulation results show that the proposed algorithm can significantly improve the localization accuracy for different wind velocities, source locations and test area sizes. The experimental results show that the proposed method can reduce localization errors to about 40% of the original error in a four nodes localization system.展开更多
By considering higher order approximation to the interaural phase difference, a more general localization equation for stereo sound image with interchannel phase difference is derived. At very low frequency or low int...By considering higher order approximation to the interaural phase difference, a more general localization equation for stereo sound image with interchannel phase difference is derived. At very low frequency or low interchannel phase difference, the equation can be simplified to Makita theory. In general, image position is obviously affected by frequency.It is shown that image position varying with freqllency is the main reason for image width broadening in stereo reproduction with interchannel phase difference. And an extra interaural sound level difference caused by interchannel phase difference is the main reason for image naturalness degrading. In practice, it is necessary to reduce the interchannel phase difference,at least, to less than 60°.展开更多
基金supported by the National Natural Science Foundation of China(61877067)the Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2019A002,TCGZ2021C003,6142414200511)the Natural Science Basic Research Program of Shaanxi(2021JZ-19)。
文摘Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.
基金This work is supported by the National Nature Science Foundation of China(NSFC)under Grant No.61571106Jiangsu Natural Science Foundation under Grant No.BK20170757the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No.17KJD510002.
文摘Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transform(SRP-PHAT)spatial spectrum as input feature is presented in this paper.Since the SRP-PHAT spatial power spectrum contains spatial location information,it is adopted as the input feature for sound source localization.DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features.SRP-PHAT at each steering position within a frame is arranged into a vector,which is treated as DNN input.A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals.The azimuth of sound source is estimated through trained DNN model from the testing signals.Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not,and is more robust to noise and reverberation.
基金Supported by the National Natural Science Foundation of China(No.61201345)the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1308)
文摘The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP search method is proposed to reduce the computational complexity using small-aperture microphone array.The proposed method inspired by the SRP spatial spectrum includes two steps:first,the proposed method estimates the azimuth of the sound source roughly and determines whether the sound source is in far field or near field;then,different fine searching operations are performed according to the sound source being in far field or near field.Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computation complexity of the proposed method with those of the conventional SRP-PHAT algorithm.The results show that,the proposed method has a comparative accuracy with the conventional SRP algorithm,and achieves a reduction of 93.62%in computation complexity compared to the conventional SRP algorithm.
基金supported by Nature Science Research Project of Higher Education Institutions in Jiangsu Province under Grant No.21KJB510018National Nature Science Foundation of China (NSFC)under Grant No.62001215.
文摘Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cannot achieve satisfactory performance in adverse noisy and reverberant environments.In order to improve localization performance,a novel SSL algorithm using convolutional residual network(CRN)is proposed in this paper.The spatial features including time difference of arrivals(TDOAs)between microphone pairs and steered response power-phase transform(SRPPHAT)spatial spectrum are extracted in each Gammatone sub-band.The spatial features of different sub-bands with a frame are combine into a feature matrix as the input of CRN.The proposed algorithm employ CRN to fuse the spatial features.Since the CRN introduces the residual structure on the basis of the convolutional network,it reduce the difficulty of training procedure and accelerate the convergence of the model.A CRN model is learned from the training data in various reverberation and noise environments to establish the mapping regularity between the input feature and the sound azimuth.Through simulation verification,compared with the methods using traditional deep neural network,the proposed algorithm can achieve a better localization performance in SSL task,and provide better generalization capacity to untrained noise and reverberation.
基金Supported by the National Natural Science Foundation of China (No. 60472058, No. 60975017)Jiangsu Provincial Natural Science Foundation (No. BK2008291)
文摘The letter proposed a sound source localization method of digital hearing aids using wavelet based multivariate statistics with the Generalized Cross Correlation (GCC) algorithm. Haar wavelet is used to decompose GCC sequences and extract four wavelet characteristics. And then, Hotelling T2 statistical method is used to fuse the four wavelet characteristics. The statistical value is used to judge the number of sound sources and obtain corresponding time delay estimation which is used to localize the position of sound source. The experimental results show that the proposed method has better robustness in an environment with severe noise and reverberation. Meanwhile, the complexity of al-gorithm is moderate, which is available for sound source localization of hearing aids.
基金supported by the National Natural Science Foundation of China(Grant Nos. 60971098 and 61201345)the Beijing Key Laboratory of Advanced Information Science and Network Technology(Grant No.XDXX1308)
文摘The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a microphone array, each location corresponds to a set of time differences of arrival (TDOAs), and this paper collects them into a TDOA vector. Since the TDOA vectors in the adjacent regions are similar, we present a fast algorithm based on clustering search to reduce the computation complexity of SRP-PHAT. In the training stage, the K-means or Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm is used to find the centroid in each cluster with similar TDOA vectors. In the procedure of sound localization, the optimal cluster is found by comparing the steered response powers (SRPs) of all centroids. The SRPs of all candidate locations in the optimal cluster are compared to localize the sound source. Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computational load of the proposed method with those of the conventional SRP-PHAT algorithm. The results show that the proposed method is able to reduce the computational load drastically and maintains almost the same localization accuracy and robustness as those of the conventional SRP-PHAT algorithm. The difference in localization performance brought by different clustering algorithms used in the training stage is trivial.
文摘In this paper the method of approximate expansion is used to analyse a perfect planar surround sound system, resulting in an order of new and upgrade systems. First reproductinn signals of the perfect system and the characteristics of different orders systems are analysed. The independent transmission signals and decoding (reproduction) equation of the systexns are given. The compatibility among different orders systems and the problem of simplifying output channels are discussed. The problem of signal picking up, recording,transmitting and the possibility of putting the systems into practical use are studied. A sound hoage localization experiment for the systems is carried out in order to study haage localization in relaion to the numbers of transmission signals and output channels. The experimental result is consistemt with the theoretical result. This work lay down a base for practical use.
基金supported by the National Natural Science Fundation of China(61501374)Underwater Information and Control Key Laboratory Fundation(9140C230310150C23102)
文摘A new sound source localization method with sound speed compensation is proposed to reduce the wind influence on the performance of conventional TDOA (Time Difference of Arrival) algorithms. First, the sound speed is described as a set of functions of the unknown source location, to approximate the acoustic velocity field distribution in the wind field. Then, they are introduced into the TDOA algorithm, to construct nonlinear equations. Finally, the particle swarm optimization algorithm is used to estimate the source location. The simulation results show that the proposed algorithm can significantly improve the localization accuracy for different wind velocities, source locations and test area sizes. The experimental results show that the proposed method can reduce localization errors to about 40% of the original error in a four nodes localization system.
文摘By considering higher order approximation to the interaural phase difference, a more general localization equation for stereo sound image with interchannel phase difference is derived. At very low frequency or low interchannel phase difference, the equation can be simplified to Makita theory. In general, image position is obviously affected by frequency.It is shown that image position varying with freqllency is the main reason for image width broadening in stereo reproduction with interchannel phase difference. And an extra interaural sound level difference caused by interchannel phase difference is the main reason for image naturalness degrading. In practice, it is necessary to reduce the interchannel phase difference,at least, to less than 60°.