Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improv...Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate.展开更多
A new technique to fabricate silicon condenser microphone is presented.The technique is based on the use of oxidized porous silicon as sacrificial layer for the air gap and the heavy p+-doping silicon of approximately...A new technique to fabricate silicon condenser microphone is presented.The technique is based on the use of oxidized porous silicon as sacrificial layer for the air gap and the heavy p+-doping silicon of approximately 15μm thickness for the stiff backplate.The measured sensitivity of the microphone fabricated with this technique is in the range from -45dB(5.6mV/Pa) to -55dB(1.78mV/Pa) under the frequency from 500Hz to 10kHz,and shows a gradual increase at higher frequency.The cut-off frequency is above 20kHz.展开更多
A large planar microphone array, which consists of 111 microphones, was successfully applied to measure a two dimensional mapping of the sound sources on landing aircraft. The focus was on the flap side edge noise s...A large planar microphone array, which consists of 111 microphones, was successfully applied to measure a two dimensional mapping of the sound sources on landing aircraft. The focus was on the flap side edge noise source in this paper. The spectra, directivity and sound pressure level of flap side edge noise of 10 aircraft were presented in this paper. It is found that the spectrum of flap side edge noise is a broadband noise with some tones in some cases. Two different types of tone sources are found. It is proposed that one type of these tone sources is trailing edge semi baffled dipole source, and another is produced from the shedding of vortex from the wing cusp. The total sound pressure level of flap side edge broadband noise has no obvious directionality. However, the directivity of the tone noise in the flap side edge noise spectrum is obvious. It is demonstrated that the local flow field is the key to controlling the flap side edge noise.展开更多
Audio applications such as mobile communication and hearing aid devices demand a small size but high performance, stable and low cost microphone to reproduce a high quality sound. Capacitive microphone can be designed...Audio applications such as mobile communication and hearing aid devices demand a small size but high performance, stable and low cost microphone to reproduce a high quality sound. Capacitive microphone can be designed to fulfill such requirements with some trade-offs between sensitivity, operating frequency range, and noise level mainly due to the effect of device structure dimensions and viscous damping. Smaller microphone size and air gap will gradually decrease its sensitivity and increase the viscous damping. The aim of this research was to develop a mathematical model of a spring-supported diaphragm capacitive MEMS microphone as well as an approach to optimize a microphone’s performance. Because of the complex shapes in this latest type of diaphragm design trend, analytical modelling has not been previously attempted. A novel diaphragm design is proposed that offers increased mechanical sensitivity of a capacitive microphone by reducing its diaphragm stiffness. A lumped element model of the spring-supported diaphragm microphone is developed to analyze the complex relations between the microphone performance factors and to find the optimum dimensions based on the design requirements. It is shown analytically that the spring dimensions of the spring-supported diaphragm do not have large effects on the microphone performance com pared to the diaphragm and backplate size, diaphragm thickness, and air-gap distance. A 1 mm2 spring-supported diaphragm microphone is designed using several optimized performance parameters to give a –3 dB operating bandwidth of 10.2 kHz, a sensitivity of 4.67 mV/Pa (–46.5 dB ref. 1 V/Pa at 1 kHz using a bias voltage of 3 V), a pull-in voltage of 13 V, and a thermal noise of –22 dBA SPL.展开更多
To improve localization accuracy, the spherical microphone arrays are used to capture high-order wavefield in- formation. For the far field sound sources, the array signal model is constructed based on plane wave deco...To improve localization accuracy, the spherical microphone arrays are used to capture high-order wavefield in- formation. For the far field sound sources, the array signal model is constructed based on plane wave decomposition. The spatial spectrum function is calculated by minimum variance distortionless response (MVDR) to scan the three-dimensional space. The peak values of the spectrum function correspond to the directions of multiple sound sources. A diagonal loading method is adopted to solve the ill-conditioned cross spectrum matrix of the received signals. The loading level depends on the alleviation of the ill-condition of the matrix and the accuracy of the inverse calculation. Compared with plane wave decomposition method, our proposed localization algorithm can acquire high spatial resolution and better estimation for multiple sound source directions, especially in low signal to noise ratio (SNR).展开更多
This paper proposes a novel microphone array speech denoising scheme based on tensor filtering methods including truncated HOSVD(High-Order Singular Value Decomposition), low rank tensor approximation and multi-mode W...This paper proposes a novel microphone array speech denoising scheme based on tensor filtering methods including truncated HOSVD(High-Order Singular Value Decomposition), low rank tensor approximation and multi-mode Wiener filtering. Microphone array speech signal is represented in three-order tensor space with channel, time, and spectrum modes and then tensor filtering model can be designed to process the multiway array data. As to the first method, noise can be reduced through the truncated HOSVD which is a simple scheme in tensor processing. It is more accurate to find the lower-rank approximation of the three-order tensor with Tucker model. Then MDL(Minimum Description Length) criterion is used to estimate the optimal tensor rank in the second method. Further, multimode Wiener filtering approach upon tensor analysis can be considered as the spanning of one-mode wiener filtering. How to take advantages of tensor model to obtain a set of filters is the heart of the novel scheme. The performances of the proposed three approaches are evaluated with objective indexes and listening quality test. The experimental results indicate that the proposed tenor filtering methods have potential ability of retrieving the target signal from noisy microphone array signal and the multi-mode Wiener filtering method provides the best denoising results among the three ones.展开更多
The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP se...The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP search method is proposed to reduce the computational complexity using small-aperture microphone array.The proposed method inspired by the SRP spatial spectrum includes two steps:first,the proposed method estimates the azimuth of the sound source roughly and determines whether the sound source is in far field or near field;then,different fine searching operations are performed according to the sound source being in far field or near field.Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computation complexity of the proposed method with those of the conventional SRP-PHAT algorithm.The results show that,the proposed method has a comparative accuracy with the conventional SRP algorithm,and achieves a reduction of 93.62%in computation complexity compared to the conventional SRP algorithm.展开更多
The noise source identification is an important issue in noise reduction and condition monitoring(CM) for machines in- site using microphone arrays. In this paper, we propose a new approach to optimize array configura...The noise source identification is an important issue in noise reduction and condition monitoring(CM) for machines in- site using microphone arrays. In this paper, we propose a new approach to optimize array configuration based on particles swarm optimization algorithm in order to improve noise source identification and condition monitoring performance. Two distinct optimized array configurations are designed under the certain conditions. Furthermore, an acoustic imaging equipment is developed to carry out experiments on transformer substation equipment and wind turbine generator, which demonstrate that the acoustic imaging system allows a high resolution in identifying main noise sources for noise reduction and abnormal noise sources for condition monitoring.展开更多
Microphone array can be used in sound source localization and separation. But gain, phase, and position errors can seriously influence the performance of localization algorithms such as multiple signal classification ...Microphone array can be used in sound source localization and separation. But gain, phase, and position errors can seriously influence the performance of localization algorithms such as multiple signal classification (MUSIC) algorithm. In this paper, a new calibration method for microphone array with gain, phase, and position errors is proposed. Unlike traditional calibration methods for antenna array, the proposed method can be used in the broadband and near-field signal model such as microphone array with arbitrary sensor geometries in one plane. Computer simulations are presented and simulation results show the new method having good performance.展开更多
Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cann...Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cannot achieve satisfactory performance in adverse noisy and reverberant environments.In order to improve localization performance,a novel SSL algorithm using convolutional residual network(CRN)is proposed in this paper.The spatial features including time difference of arrivals(TDOAs)between microphone pairs and steered response power-phase transform(SRPPHAT)spatial spectrum are extracted in each Gammatone sub-band.The spatial features of different sub-bands with a frame are combine into a feature matrix as the input of CRN.The proposed algorithm employ CRN to fuse the spatial features.Since the CRN introduces the residual structure on the basis of the convolutional network,it reduce the difficulty of training procedure and accelerate the convergence of the model.A CRN model is learned from the training data in various reverberation and noise environments to establish the mapping regularity between the input feature and the sound azimuth.Through simulation verification,compared with the methods using traditional deep neural network,the proposed algorithm can achieve a better localization performance in SSL task,and provide better generalization capacity to untrained noise and reverberation.展开更多
As the requirements of production process is getting higher and higher with the reduction of volume,microphone production automation become an urgent need to improve the production efficiency.The most important part i...As the requirements of production process is getting higher and higher with the reduction of volume,microphone production automation become an urgent need to improve the production efficiency.The most important part is studied and a precise algorithm of calculating the deviation angle of four types microphones is proposed,based on the feature extraction and visual detection.Pretreatment is performed to achieve the real-time microphone image.Canny edge detection and typical feature extraction are used to distinguish the four types of microphones,categorizing them as type M1 and type M2.And Hough transformation is used to extract the image features of microphone.Therefore,the deviation angle between the posture of microphone and the ideal posture in 2Dplane can be achieved.Depending on the angle,the system drives the motor to adjust posture of the microphone.The final purpose is to realize the high efficiency welding of four different types of microphones.展开更多
Photoacoustic spectroscopy was used to test the photoacoustic properties of sulfur hexafluoride, an optically thick and potent greenhouse gas. While exploring the photoacoustic effect of sulfur hexafluoride, the effec...Photoacoustic spectroscopy was used to test the photoacoustic properties of sulfur hexafluoride, an optically thick and potent greenhouse gas. While exploring the photoacoustic effect of sulfur hexafluoride, the effects of the position of the microphone within a gas cell were determined. Using a 35 cm gas cell, microphones were positioned at 17.5 cm, the middle of the gas cell, 12.5 cm, 7.5 cm, and 2.5 cm from the window of the cell. From the photoacoustic signal produced for each resonance frequency at each microphone position, the effects of acoustic pressure produced at each position on the signal recorded were observed. This is the first study done by experimentation with the photoacoustic effect to show that standing waves have different amplitudes at different microphone positions.展开更多
Based on W-disjoint orthogonality of speech mixtures, a space d,scnmlnative tunetlon was proposer1 to enumerate and localize competing speakers in the surrounding environments. Then, a Wiener-like postfiherer was deve...Based on W-disjoint orthogonality of speech mixtures, a space d,scnmlnative tunetlon was proposer1 to enumerate and localize competing speakers in the surrounding environments. Then, a Wiener-like postfiherer was developed to adaptively suppress interferences. Experimental results with a hands-free speech recognizer under various SNR and competing speakers settings show that nearly 69 % error reduction can be obtained with a two-channel small aperture microphone array against the conventional single microphone baseline system. Comparisons were made against traditional delay-and-sum and Griffiths-Jim adaptive beamforming techniques to further assess the effectiveness of this method.展开更多
基金This work is supported by Nanjing Institute of Technology(NIT)fund for Research Startup Projects of Introduced talents under Grant No.YKJ202019Nature Sci-ence Research Project of Higher Education Institutions in Jiangsu Province under Grant No.21KJB510018+1 种基金National Nature Science Foundation of China(NSFC)under Grant No.62001215NIT fund for Doctoral Research Projects under Grant No.ZKJ2020003.
文摘Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate.
文摘A new technique to fabricate silicon condenser microphone is presented.The technique is based on the use of oxidized porous silicon as sacrificial layer for the air gap and the heavy p+-doping silicon of approximately 15μm thickness for the stiff backplate.The measured sensitivity of the microphone fabricated with this technique is in the range from -45dB(5.6mV/Pa) to -55dB(1.78mV/Pa) under the frequency from 500Hz to 10kHz,and shows a gradual increase at higher frequency.The cut-off frequency is above 20kHz.
基金F inancially supported by the Bundersministerium fur Bildung und Forschung ( BMBF) of Germ any
文摘A large planar microphone array, which consists of 111 microphones, was successfully applied to measure a two dimensional mapping of the sound sources on landing aircraft. The focus was on the flap side edge noise source in this paper. The spectra, directivity and sound pressure level of flap side edge noise of 10 aircraft were presented in this paper. It is found that the spectrum of flap side edge noise is a broadband noise with some tones in some cases. Two different types of tone sources are found. It is proposed that one type of these tone sources is trailing edge semi baffled dipole source, and another is produced from the shedding of vortex from the wing cusp. The total sound pressure level of flap side edge broadband noise has no obvious directionality. However, the directivity of the tone noise in the flap side edge noise spectrum is obvious. It is demonstrated that the local flow field is the key to controlling the flap side edge noise.
文摘Audio applications such as mobile communication and hearing aid devices demand a small size but high performance, stable and low cost microphone to reproduce a high quality sound. Capacitive microphone can be designed to fulfill such requirements with some trade-offs between sensitivity, operating frequency range, and noise level mainly due to the effect of device structure dimensions and viscous damping. Smaller microphone size and air gap will gradually decrease its sensitivity and increase the viscous damping. The aim of this research was to develop a mathematical model of a spring-supported diaphragm capacitive MEMS microphone as well as an approach to optimize a microphone’s performance. Because of the complex shapes in this latest type of diaphragm design trend, analytical modelling has not been previously attempted. A novel diaphragm design is proposed that offers increased mechanical sensitivity of a capacitive microphone by reducing its diaphragm stiffness. A lumped element model of the spring-supported diaphragm microphone is developed to analyze the complex relations between the microphone performance factors and to find the optimum dimensions based on the design requirements. It is shown analytically that the spring dimensions of the spring-supported diaphragm do not have large effects on the microphone performance com pared to the diaphragm and backplate size, diaphragm thickness, and air-gap distance. A 1 mm2 spring-supported diaphragm microphone is designed using several optimized performance parameters to give a –3 dB operating bandwidth of 10.2 kHz, a sensitivity of 4.67 mV/Pa (–46.5 dB ref. 1 V/Pa at 1 kHz using a bias voltage of 3 V), a pull-in voltage of 13 V, and a thermal noise of –22 dBA SPL.
基金Project supported by the National Natural Science Foundation of China (Grant No.61001160)the Doctoral Foundation of Ministry of Education (Grant No.20093108120018)the Shanghai Leading Academic Discipline Project (Grant No.S30108)
文摘To improve localization accuracy, the spherical microphone arrays are used to capture high-order wavefield in- formation. For the far field sound sources, the array signal model is constructed based on plane wave decomposition. The spatial spectrum function is calculated by minimum variance distortionless response (MVDR) to scan the three-dimensional space. The peak values of the spectrum function correspond to the directions of multiple sound sources. A diagonal loading method is adopted to solve the ill-conditioned cross spectrum matrix of the received signals. The loading level depends on the alleviation of the ill-condition of the matrix and the accuracy of the inverse calculation. Compared with plane wave decomposition method, our proposed localization algorithm can acquire high spatial resolution and better estimation for multiple sound source directions, especially in low signal to noise ratio (SNR).
基金supported by the National Natural Science Foundation of China(No.61571044,No.11590772,No.61473041 and No.61620106002)
文摘This paper proposes a novel microphone array speech denoising scheme based on tensor filtering methods including truncated HOSVD(High-Order Singular Value Decomposition), low rank tensor approximation and multi-mode Wiener filtering. Microphone array speech signal is represented in three-order tensor space with channel, time, and spectrum modes and then tensor filtering model can be designed to process the multiway array data. As to the first method, noise can be reduced through the truncated HOSVD which is a simple scheme in tensor processing. It is more accurate to find the lower-rank approximation of the three-order tensor with Tucker model. Then MDL(Minimum Description Length) criterion is used to estimate the optimal tensor rank in the second method. Further, multimode Wiener filtering approach upon tensor analysis can be considered as the spanning of one-mode wiener filtering. How to take advantages of tensor model to obtain a set of filters is the heart of the novel scheme. The performances of the proposed three approaches are evaluated with objective indexes and listening quality test. The experimental results indicate that the proposed tenor filtering methods have potential ability of retrieving the target signal from noisy microphone array signal and the multi-mode Wiener filtering method provides the best denoising results among the three ones.
基金Supported by the National Natural Science Foundation of China(No.61201345)the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1308)
文摘The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP search method is proposed to reduce the computational complexity using small-aperture microphone array.The proposed method inspired by the SRP spatial spectrum includes two steps:first,the proposed method estimates the azimuth of the sound source roughly and determines whether the sound source is in far field or near field;then,different fine searching operations are performed according to the sound source being in far field or near field.Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computation complexity of the proposed method with those of the conventional SRP-PHAT algorithm.The results show that,the proposed method has a comparative accuracy with the conventional SRP algorithm,and achieves a reduction of 93.62%in computation complexity compared to the conventional SRP algorithm.
文摘The noise source identification is an important issue in noise reduction and condition monitoring(CM) for machines in- site using microphone arrays. In this paper, we propose a new approach to optimize array configuration based on particles swarm optimization algorithm in order to improve noise source identification and condition monitoring performance. Two distinct optimized array configurations are designed under the certain conditions. Furthermore, an acoustic imaging equipment is developed to carry out experiments on transformer substation equipment and wind turbine generator, which demonstrate that the acoustic imaging system allows a high resolution in identifying main noise sources for noise reduction and abnormal noise sources for condition monitoring.
基金This work was supported by the Key Project of Science and Technology of Sichuan Province under Grant No. 04GG21-02-20.
文摘Microphone array can be used in sound source localization and separation. But gain, phase, and position errors can seriously influence the performance of localization algorithms such as multiple signal classification (MUSIC) algorithm. In this paper, a new calibration method for microphone array with gain, phase, and position errors is proposed. Unlike traditional calibration methods for antenna array, the proposed method can be used in the broadband and near-field signal model such as microphone array with arbitrary sensor geometries in one plane. Computer simulations are presented and simulation results show the new method having good performance.
基金supported by Nature Science Research Project of Higher Education Institutions in Jiangsu Province under Grant No.21KJB510018National Nature Science Foundation of China (NSFC)under Grant No.62001215.
文摘Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cannot achieve satisfactory performance in adverse noisy and reverberant environments.In order to improve localization performance,a novel SSL algorithm using convolutional residual network(CRN)is proposed in this paper.The spatial features including time difference of arrivals(TDOAs)between microphone pairs and steered response power-phase transform(SRPPHAT)spatial spectrum are extracted in each Gammatone sub-band.The spatial features of different sub-bands with a frame are combine into a feature matrix as the input of CRN.The proposed algorithm employ CRN to fuse the spatial features.Since the CRN introduces the residual structure on the basis of the convolutional network,it reduce the difficulty of training procedure and accelerate the convergence of the model.A CRN model is learned from the training data in various reverberation and noise environments to establish the mapping regularity between the input feature and the sound azimuth.Through simulation verification,compared with the methods using traditional deep neural network,the proposed algorithm can achieve a better localization performance in SSL task,and provide better generalization capacity to untrained noise and reverberation.
基金supported by the Project of Youth Fund of the National Natural Science Foundation (No. 61203208)the National Natural Science Foundation of China(No.61327802)the Specialized Research Fund for the Doctoral Program of Higher Education (No.2013320111 0009)
文摘As the requirements of production process is getting higher and higher with the reduction of volume,microphone production automation become an urgent need to improve the production efficiency.The most important part is studied and a precise algorithm of calculating the deviation angle of four types microphones is proposed,based on the feature extraction and visual detection.Pretreatment is performed to achieve the real-time microphone image.Canny edge detection and typical feature extraction are used to distinguish the four types of microphones,categorizing them as type M1 and type M2.And Hough transformation is used to extract the image features of microphone.Therefore,the deviation angle between the posture of microphone and the ideal posture in 2Dplane can be achieved.Depending on the angle,the system drives the motor to adjust posture of the microphone.The final purpose is to realize the high efficiency welding of four different types of microphones.
文摘Photoacoustic spectroscopy was used to test the photoacoustic properties of sulfur hexafluoride, an optically thick and potent greenhouse gas. While exploring the photoacoustic effect of sulfur hexafluoride, the effects of the position of the microphone within a gas cell were determined. Using a 35 cm gas cell, microphones were positioned at 17.5 cm, the middle of the gas cell, 12.5 cm, 7.5 cm, and 2.5 cm from the window of the cell. From the photoacoustic signal produced for each resonance frequency at each microphone position, the effects of acoustic pressure produced at each position on the signal recorded were observed. This is the first study done by experimentation with the photoacoustic effect to show that standing waves have different amplitudes at different microphone positions.
文摘Based on W-disjoint orthogonality of speech mixtures, a space d,scnmlnative tunetlon was proposer1 to enumerate and localize competing speakers in the surrounding environments. Then, a Wiener-like postfiherer was developed to adaptively suppress interferences. Experimental results with a hands-free speech recognizer under various SNR and competing speakers settings show that nearly 69 % error reduction can be obtained with a two-channel small aperture microphone array against the conventional single microphone baseline system. Comparisons were made against traditional delay-and-sum and Griffiths-Jim adaptive beamforming techniques to further assess the effectiveness of this method.
基金The work was supported by the Program for New Century Excellent Talents in University (No. NCET-05-0582) and by the Natural Science Foundation of Shandong Province (No. Y2007G04).