To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estima...To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estimate the noise spectrum and the partial masking effect which is one of the psychoacoustic properties was introduced to reinforce speech.The performance evaluation was performed by comparing the PESQ(perceptual evaluation of speech quality) and segSNR(segmental signal to noise ratio) by the proposed algorithm with the conventional algorithm.As a result,average PESQ by the proposed algorithm was higher than the average PESQ by the conventional noise reduction algorithm and segSNR was higher as much as 3.2 dB in average than that of the noise reduction algorithm.展开更多
Quadratic Discrimination Function (QDF) is commonly used in speech emotion recognition, which proceeds on the premise that the input data is normal distribution. In this paper, we propose a transformation to normali...Quadratic Discrimination Function (QDF) is commonly used in speech emotion recognition, which proceeds on the premise that the input data is normal distribution. In this paper, we propose a transformation to normalize the emotional features, emotion recognition. Features based on prosody then derivate a Modified QDF (MQDF) to speech and voice quality are extracted and Principal Component Analysis Neural Network (PCANN) is used to reduce dimension of the feature vectors. The results show that voice quality features are effective supplement for recognition, and the method in this paper could improve the recognition ratio effectively.展开更多
Spectral subtraction is used in this research as a method to remove noise from noisy speech signals in the frequency domain. This method consists of computing the spectrum of the noisy speech using the Fast Fourier Tr...Spectral subtraction is used in this research as a method to remove noise from noisy speech signals in the frequency domain. This method consists of computing the spectrum of the noisy speech using the Fast Fourier Transform (FFT) and subtracting the average magnitude of the noise spectrum from the noisy speech spectrum. We applied spectral subtraction to the speech signal “Real graph”. A digital audio recorder system embedded in a personal computer was used to sample the speech signal “Real graph” to which we digitally added vacuum cleaner noise. The noise removal algorithm was implemented using Matlab software by storing the noisy speech data into Hanning time-widowed half-overlapped data buffers, computing the corresponding spectrums using the FFT, removing the noise from the noisy speech, and reconstructing the speech back into the time domain using the inverse Fast Fourier Transform (IFFT). The performance of the algorithm was evaluated by calculating the Speech to Noise Ratio (SNR). Frame averaging was introduced as an optional technique that could improve the SNR. Seventeen different configurations with various lengths of the Hanning time windows, various degrees of data buffers overlapping, and various numbers of frames to be averaged were investigated in view of improving the SNR. Results showed that using one-fourth overlapped data buffers with 128 points Hanning windows and no frames averaging leads to the best performance in removing noise from the noisy speech.展开更多
This paper presents a deep neural network(DNN)-based speech enhancement algorithm based on the soft audible noise masking for the single-channel wind noise reduction. To reduce the low-frequency residual noise, the ps...This paper presents a deep neural network(DNN)-based speech enhancement algorithm based on the soft audible noise masking for the single-channel wind noise reduction. To reduce the low-frequency residual noise, the psychoacoustic model is adopted to calculate the masking threshold from the estimated clean speech spectrum. The gain for noise suppression is obtained based on soft audible noise masking by comparing the estimated wind noise spectrum with the masking threshold. To deal with the abruptly time-varying noisy signals, two separate DNN models are utilized to estimate the spectra of clean speech and wind noise components. Experimental results on the subjective and objective quality tests show that the proposed algorithm achieves the better performance compared with the conventional DNN-based wind noise reduction method.展开更多
In order to apply speech recognition systems to actual circumstances such as inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-wri...In order to apply speech recognition systems to actual circumstances such as inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult, some countermeasure methods for surrounding noise are indispensable. In this study, a signal detection method to remove the noise for actual speech signals is proposed by using Bayesian estimation with the aid of bone-conducted speech. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal is theoretically derived. In the proposed speech detection method, bone-conducted speech is utilized in order to obtain precise estimation for speech signals. The effectiveness of the proposed method is experimentally confirmed by applying it to air- and bone-conducted speeches measured in real environment under the existence of surrounding background noise.展开更多
Most noise suppression algorithms of single channel use the mean of noisy segments to estimate the characteristics of noise spectrum, ignoring the estimation of noise in speech segments. Therefore, when the energy lev...Most noise suppression algorithms of single channel use the mean of noisy segments to estimate the characteristics of noise spectrum, ignoring the estimation of noise in speech segments. Therefore, when the energy level of noise varies with the time, the performance of removing noise will be degraded. To solve this problem, a speech enhancement approach based on dynamic noise estimation within correlation domain was proposed. This method exploits the characteristics that noise energy mainly concentrates on 0 th order correlation coefficients, signal is auto correlated but signal and noise, noise and noise are uncorrelated, then estimates and decomposes the noise, thus helps to solve the above mentioned problem. The results of recognition experiments on speech signals of 15 Chinese cities’ names corrupted by noise of exhibition hall shows, this approach is better than SS (Spectral Subtraction) method, adapts better to the variances of energy levels of speech signal corrupted by noise, has some practicability to improve the robustness of recognition systems under noisy environment.展开更多
Objective:To explore the clinical evaluation role of the Digits-in-Noise(DIN)test and Hearing Handicap Inventory for Adults Screening(HHIA-S)for patients with occupational noise-induced hearing loss and to observe and...Objective:To explore the clinical evaluation role of the Digits-in-Noise(DIN)test and Hearing Handicap Inventory for Adults Screening(HHIA-S)for patients with occupational noise-induced hearing loss and to observe and analyze their application values.Methods:Fifty patients with suspected occupational noise-induced hearing loss were randomly selected from the Department of Otolaryngology at the hospital as the research target.The collection period for the research cases spanned from January 2022 to November 2023,and all patients had a history of noise exposure.The DIN test and HHIA-S were used for hearing examinations,with clinical,comprehensive diagnosis serving as the gold standard to study their diagnostic performance.Results:The compliance rate of the DIN test was 88.00%,the HHIA-S’s compliance rate was 80.00%,and the combined compliance rate was 94.00%.The compliance rate of the DIN test and the combined compliance rates of the patients were statistically significant compared to the clinical gold standard data(P<0.05),while there was no difference between the compliance rate of the HHIA-S and the gold standard(P>0.05).The data shows that the sensitivity of the combined diagnosis is significantly higher than the sensitivity data of the DIN test and HHIA-S examination alone(P<0.05).Its specificity is 100.00%,and the accuracy data of the joint diagnosis in the degree were higher than those of the DIN test alone(P>0.05)and the HHIA-S alone(P<0.05).Conclusion:For patients with occupational noise-induced hearing loss,the joint evaluation of the DIN test and HHIA-S can significantly improve their diagnostic value with high sensitivity and accuracy.展开更多
A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model (HMM). The smoothing factor of this estimator was motivated by the speech presence probability at each freque...A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model (HMM). The smoothing factor of this estimator was motivated by the speech presence probability at each frequency band. This HMM had a speech state and a nonspeech state, and each state consisted of a unique Gaussian function. The mean of the nonspeech state was the estimation of the noise logarithmic power. To make this estimator run in an on-line manner, an HMM parameter updated method was used based on a first-order recursive process. The noise signal was tracked together with the HMM to be sequentially updated. For the sake of reliability, some constraints were introduced to the HMM. The proposed algorithm was compared with the conventional ones such as minimum statistics (MS) and improved minima controlled recursive averaging (IM- CRA). The experimental results confirms its promising performance.展开更多
The immittance spectral frequencies (ISFs) is proposed as a new set of classification features and compared with the linear spectral frequencies (LSFs) applied in a frame-level wideband speech/music discrimination...The immittance spectral frequencies (ISFs) is proposed as a new set of classification features and compared with the linear spectral frequencies (LSFs) applied in a frame-level wideband speech/music discrimination system. These two sets of features can be shared by the classifier and coding module to reduce the total computational complexity, making our classification system suitable for multi-mode audio coding applications. A performance assessment and comparison of the features are made. The experiment results show that the ISFs and LSFs have similar good performance when using full covariance matrices in classification models and the ISFs perform slightly better when using diagonal matrices. Their statistical differences for speech and music signals are also revealed.展开更多
Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:...Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities.展开更多
In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed spe...In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively suppress noises. An automatic speech recognition (ASR) process to deal with the multi-speaker task is designed and implemented. Evaluation tests have been carried out by using the speech da- tabase NOIZEUS and the experimental results show that the proposed algorithm achieves impressive performance improvements.展开更多
This pilot study focuses on employment of hybrid LMS-ICA system for in-vehicle background noise reduction.Modern vehicles are nowadays increasingly supporting voice commands,which are one of the pillars of autonomous ...This pilot study focuses on employment of hybrid LMS-ICA system for in-vehicle background noise reduction.Modern vehicles are nowadays increasingly supporting voice commands,which are one of the pillars of autonomous and SMART vehicles.Robust speaker recognition for context-aware in-vehicle applications is limited to a certain extent by in-vehicle back-ground noise.This article presents the new concept of a hybrid system which is implemented as a virtual instrument.The highly modular concept of the virtual car used in combination with real recordings of various driving scenarios enables effective testing of the investigated methods of in-vehicle background noise reduction.The study also presents a unique concept of an adaptive system using intelligent clusters of distributed next generation 5G data networks,which allows the exchange of interference information and/or optimal hybrid algorithm settings between individual vehicles.On average,the unfiltered voice commands were successfully recognized in 29.34%of all scenarios,while the LMS reached up to 71.81%,and LMS-ICA hybrid improved the performance further to 73.03%.展开更多
Speech recognition systems have been applied to inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult. In these a...Speech recognition systems have been applied to inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult. In these actual circumstances, some countermeasure methods for surrounding noise are indispensable. In this study, a new method to remove the noise for actual speech signal was proposed by using Bayesian estimation with the aid of bone-conducted speech and fuzzy theory. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal was theoretically derived. In the proposed noise suppression method, bone-conducted speech signal with the reduced high-frequency components was regarded as fuzzy observation data, and a stochastic model for the bone-conducted speech was derived by applying the probability measure of fuzzy events. The proposed method was applied to speech signals measured in real environment with low SNR, and better results were obtained than an algorithm based on observation of only air-conducted speech.展开更多
A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize...A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.展开更多
Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samp...Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA.展开更多
Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration t...Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented. Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models. The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores. Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model. Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.展开更多
Noise feedback coding (NFC) has attracted renewed interest with the recent standardization of backward-compatible enhancements for ITU-T G.711 and G.722. It has also been revisited with the emergence of proprietary ...Noise feedback coding (NFC) has attracted renewed interest with the recent standardization of backward-compatible enhancements for ITU-T G.711 and G.722. It has also been revisited with the emergence of proprietary speech codecs, such as BV16, BV32, and SILK, that have structures different from CELP coding. In this article, we review NFC and describe a novel coding technique that optimally shapes coding noise in embedded pulse-code modulation (PCM) and embedded adaptive differential PCM (ADPCM). We describe how this new technique was incorporated into the recent ITU-T G.711.1, G.711 App. III, and G.722 Annex B (G.722B) speech-coding standards.展开更多
On augmentation of past work, an effective Wiener filter and its application for noise suppression combined with a formed CORDIC based FFT/IFFT processor with improved speed were executed. The pipelined methodology wa...On augmentation of past work, an effective Wiener filter and its application for noise suppression combined with a formed CORDIC based FFT/IFFT processor with improved speed were executed. The pipelined methodology was embraced for expanding the execution of the system. The proposed Wiener filter was planned in such an approach to evacuate the iteration issues in ordinary Wiener filter. The division process was supplanted by a productive inverse and multiplication process in the proposed design. An enhanced design for matrix inverse with reduced computation complexity was executed. The wide-ranging framework processing was focused around IEEE-754 standard single precision floating point numbers. The Wiener filter and the entire system design was integrated and actualized on VIRTEX 5 FPGA stage and re-enacted to approve the results in Xilinx ISE 13.4. The results show that a productive decrease in power and area is developed by adjusting the proposed technique for speech signal noise degradation with latency of n/2 clock cycles and substantial throughput result per every 12 clock cycles for n-bit precision. The execution of proposed design is exposed to be 31.35% more effective than that of prevailing strategies.展开更多
Time delay estimation (TDE) is an important issue in signal processing. Conventional TDE algorithms are usually efficient under white noise environments. In this paper, a joint noise reduction and -norm minimization m...Time delay estimation (TDE) is an important issue in signal processing. Conventional TDE algorithms are usually efficient under white noise environments. In this paper, a joint noise reduction and -norm minimization method is presented to enhance TDE in colored noise. An improved subspace method for colored noise reduction is first performed. Then the time delay is estimated by using an -norm minimization method. Because the clean speech signal form the noisy signal is well extracted by noise reduction and the -norm minimization method is robust, the TDE accuracy can be enhanced. Experiment results confirm that the proposed joint estimation method obtains more accurate TDE than several conventional algorithms in colored noise, especially in the case of low signal-to-noise ratio. 展开更多
Ambient noise in 13 sawinill’s workshops was tested and analyzed to reveal its character and regularity. The situation of the ambient noise in sawimill s workshops was evaluated.
文摘To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estimate the noise spectrum and the partial masking effect which is one of the psychoacoustic properties was introduced to reinforce speech.The performance evaluation was performed by comparing the PESQ(perceptual evaluation of speech quality) and segSNR(segmental signal to noise ratio) by the proposed algorithm with the conventional algorithm.As a result,average PESQ by the proposed algorithm was higher than the average PESQ by the conventional noise reduction algorithm and segSNR was higher as much as 3.2 dB in average than that of the noise reduction algorithm.
基金the Ministry of Education Fund (No: 20050286001)Ministry of Education "New Century Tal-ents Support Plan" (No:NCET-04-0483)Doctoral Foundation of Ministry of Education (No:20050286001).
文摘Quadratic Discrimination Function (QDF) is commonly used in speech emotion recognition, which proceeds on the premise that the input data is normal distribution. In this paper, we propose a transformation to normalize the emotional features, emotion recognition. Features based on prosody then derivate a Modified QDF (MQDF) to speech and voice quality are extracted and Principal Component Analysis Neural Network (PCANN) is used to reduce dimension of the feature vectors. The results show that voice quality features are effective supplement for recognition, and the method in this paper could improve the recognition ratio effectively.
文摘Spectral subtraction is used in this research as a method to remove noise from noisy speech signals in the frequency domain. This method consists of computing the spectrum of the noisy speech using the Fast Fourier Transform (FFT) and subtracting the average magnitude of the noise spectrum from the noisy speech spectrum. We applied spectral subtraction to the speech signal “Real graph”. A digital audio recorder system embedded in a personal computer was used to sample the speech signal “Real graph” to which we digitally added vacuum cleaner noise. The noise removal algorithm was implemented using Matlab software by storing the noisy speech data into Hanning time-widowed half-overlapped data buffers, computing the corresponding spectrums using the FFT, removing the noise from the noisy speech, and reconstructing the speech back into the time domain using the inverse Fast Fourier Transform (IFFT). The performance of the algorithm was evaluated by calculating the Speech to Noise Ratio (SNR). Frame averaging was introduced as an optional technique that could improve the SNR. Seventeen different configurations with various lengths of the Hanning time windows, various degrees of data buffers overlapping, and various numbers of frames to be averaged were investigated in view of improving the SNR. Results showed that using one-fourth overlapped data buffers with 128 points Hanning windows and no frames averaging leads to the best performance in removing noise from the noisy speech.
基金partially supported by the National Natural Science Foundation of China (Nos.11590772, 11590770)the Pre-research Project for Equipment of General Information System (No.JZX2017-0994/Y306)
文摘This paper presents a deep neural network(DNN)-based speech enhancement algorithm based on the soft audible noise masking for the single-channel wind noise reduction. To reduce the low-frequency residual noise, the psychoacoustic model is adopted to calculate the masking threshold from the estimated clean speech spectrum. The gain for noise suppression is obtained based on soft audible noise masking by comparing the estimated wind noise spectrum with the masking threshold. To deal with the abruptly time-varying noisy signals, two separate DNN models are utilized to estimate the spectra of clean speech and wind noise components. Experimental results on the subjective and objective quality tests show that the proposed algorithm achieves the better performance compared with the conventional DNN-based wind noise reduction method.
文摘In order to apply speech recognition systems to actual circumstances such as inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult, some countermeasure methods for surrounding noise are indispensable. In this study, a signal detection method to remove the noise for actual speech signals is proposed by using Bayesian estimation with the aid of bone-conducted speech. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal is theoretically derived. In the proposed speech detection method, bone-conducted speech is utilized in order to obtain precise estimation for speech signals. The effectiveness of the proposed method is experimentally confirmed by applying it to air- and bone-conducted speeches measured in real environment under the existence of surrounding background noise.
文摘Most noise suppression algorithms of single channel use the mean of noisy segments to estimate the characteristics of noise spectrum, ignoring the estimation of noise in speech segments. Therefore, when the energy level of noise varies with the time, the performance of removing noise will be degraded. To solve this problem, a speech enhancement approach based on dynamic noise estimation within correlation domain was proposed. This method exploits the characteristics that noise energy mainly concentrates on 0 th order correlation coefficients, signal is auto correlated but signal and noise, noise and noise are uncorrelated, then estimates and decomposes the noise, thus helps to solve the above mentioned problem. The results of recognition experiments on speech signals of 15 Chinese cities’ names corrupted by noise of exhibition hall shows, this approach is better than SS (Spectral Subtraction) method, adapts better to the variances of energy levels of speech signal corrupted by noise, has some practicability to improve the robustness of recognition systems under noisy environment.
文摘Objective:To explore the clinical evaluation role of the Digits-in-Noise(DIN)test and Hearing Handicap Inventory for Adults Screening(HHIA-S)for patients with occupational noise-induced hearing loss and to observe and analyze their application values.Methods:Fifty patients with suspected occupational noise-induced hearing loss were randomly selected from the Department of Otolaryngology at the hospital as the research target.The collection period for the research cases spanned from January 2022 to November 2023,and all patients had a history of noise exposure.The DIN test and HHIA-S were used for hearing examinations,with clinical,comprehensive diagnosis serving as the gold standard to study their diagnostic performance.Results:The compliance rate of the DIN test was 88.00%,the HHIA-S’s compliance rate was 80.00%,and the combined compliance rate was 94.00%.The compliance rate of the DIN test and the combined compliance rates of the patients were statistically significant compared to the clinical gold standard data(P<0.05),while there was no difference between the compliance rate of the HHIA-S and the gold standard(P>0.05).The data shows that the sensitivity of the combined diagnosis is significantly higher than the sensitivity data of the DIN test and HHIA-S examination alone(P<0.05).Its specificity is 100.00%,and the accuracy data of the joint diagnosis in the degree were higher than those of the DIN test alone(P>0.05)and the HHIA-S alone(P<0.05).Conclusion:For patients with occupational noise-induced hearing loss,the joint evaluation of the DIN test and HHIA-S can significantly improve their diagnostic value with high sensitivity and accuracy.
基金Supported by the National Key Basic Research Program of China(2013CB329302)the National Natural Science Foundation of China(61271426,10925419,90920302,61072124,11074275,11161140319,91120001)+3 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA06030100,XDA06030500)the National "863" Program(2012AA012503)the CAS Priority Deployment Project(KGZD-EW-103-2)Jiangxi Provincial Department of Education Science and Technology Project(GJJ13426)
文摘A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model (HMM). The smoothing factor of this estimator was motivated by the speech presence probability at each frequency band. This HMM had a speech state and a nonspeech state, and each state consisted of a unique Gaussian function. The mean of the nonspeech state was the estimation of the noise logarithmic power. To make this estimator run in an on-line manner, an HMM parameter updated method was used based on a first-order recursive process. The noise signal was tracked together with the HMM to be sequentially updated. For the sake of reliability, some constraints were introduced to the HMM. The proposed algorithm was compared with the conventional ones such as minimum statistics (MS) and improved minima controlled recursive averaging (IM- CRA). The experimental results confirms its promising performance.
文摘The immittance spectral frequencies (ISFs) is proposed as a new set of classification features and compared with the linear spectral frequencies (LSFs) applied in a frame-level wideband speech/music discrimination system. These two sets of features can be shared by the classifier and coding module to reduce the total computational complexity, making our classification system suitable for multi-mode audio coding applications. A performance assessment and comparison of the features are made. The experiment results show that the ISFs and LSFs have similar good performance when using full covariance matrices in classification models and the ISFs perform slightly better when using diagonal matrices. Their statistical differences for speech and music signals are also revealed.
文摘Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities.
文摘In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively suppress noises. An automatic speech recognition (ASR) process to deal with the multi-speaker task is designed and implemented. Evaluation tests have been carried out by using the speech da- tabase NOIZEUS and the experimental results show that the proposed algorithm achieves impressive performance improvements.
基金This research was funded by the European Regional Development Fund in the Research Centre of Advanced Mechatronic Systems project, project number CZ.02.1.01/0.0/0.0/16_019 /0000867by the Ministry of Education of the Czech Republic, Project No. SP2021/32.
文摘This pilot study focuses on employment of hybrid LMS-ICA system for in-vehicle background noise reduction.Modern vehicles are nowadays increasingly supporting voice commands,which are one of the pillars of autonomous and SMART vehicles.Robust speaker recognition for context-aware in-vehicle applications is limited to a certain extent by in-vehicle back-ground noise.This article presents the new concept of a hybrid system which is implemented as a virtual instrument.The highly modular concept of the virtual car used in combination with real recordings of various driving scenarios enables effective testing of the investigated methods of in-vehicle background noise reduction.The study also presents a unique concept of an adaptive system using intelligent clusters of distributed next generation 5G data networks,which allows the exchange of interference information and/or optimal hybrid algorithm settings between individual vehicles.On average,the unfiltered voice commands were successfully recognized in 29.34%of all scenarios,while the LMS reached up to 71.81%,and LMS-ICA hybrid improved the performance further to 73.03%.
文摘Speech recognition systems have been applied to inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult. In these actual circumstances, some countermeasure methods for surrounding noise are indispensable. In this study, a new method to remove the noise for actual speech signal was proposed by using Bayesian estimation with the aid of bone-conducted speech and fuzzy theory. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal was theoretically derived. In the proposed noise suppression method, bone-conducted speech signal with the reduced high-frequency components was regarded as fuzzy observation data, and a stochastic model for the bone-conducted speech was derived by applying the probability measure of fuzzy events. The proposed method was applied to speech signals measured in real environment with low SNR, and better results were obtained than an algorithm based on observation of only air-conducted speech.
基金The National Natural Science Foundation of China (No.61231002,61273266,51075068,60872073,60975017, 61003131)the Ph.D.Programs Foundation of the Ministry of Education of China(No.20110092130004)+1 种基金the Science Foundation for Young Talents in the Educational Committee of Anhui Province(No. 2010SQRL018)the 211 Project of Anhui University(No.2009QN027B)
文摘A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.
基金The National Natural Science Foundation of China(No.61231002,61273266)the Ph.D.Programs Foundation of Ministry of Education of China(No.20110092130004)
文摘Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA.
文摘Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented. Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models. The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores. Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model. Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.
文摘Noise feedback coding (NFC) has attracted renewed interest with the recent standardization of backward-compatible enhancements for ITU-T G.711 and G.722. It has also been revisited with the emergence of proprietary speech codecs, such as BV16, BV32, and SILK, that have structures different from CELP coding. In this article, we review NFC and describe a novel coding technique that optimally shapes coding noise in embedded pulse-code modulation (PCM) and embedded adaptive differential PCM (ADPCM). We describe how this new technique was incorporated into the recent ITU-T G.711.1, G.711 App. III, and G.722 Annex B (G.722B) speech-coding standards.
文摘On augmentation of past work, an effective Wiener filter and its application for noise suppression combined with a formed CORDIC based FFT/IFFT processor with improved speed were executed. The pipelined methodology was embraced for expanding the execution of the system. The proposed Wiener filter was planned in such an approach to evacuate the iteration issues in ordinary Wiener filter. The division process was supplanted by a productive inverse and multiplication process in the proposed design. An enhanced design for matrix inverse with reduced computation complexity was executed. The wide-ranging framework processing was focused around IEEE-754 standard single precision floating point numbers. The Wiener filter and the entire system design was integrated and actualized on VIRTEX 5 FPGA stage and re-enacted to approve the results in Xilinx ISE 13.4. The results show that a productive decrease in power and area is developed by adjusting the proposed technique for speech signal noise degradation with latency of n/2 clock cycles and substantial throughput result per every 12 clock cycles for n-bit precision. The execution of proposed design is exposed to be 31.35% more effective than that of prevailing strategies.
文摘Time delay estimation (TDE) is an important issue in signal processing. Conventional TDE algorithms are usually efficient under white noise environments. In this paper, a joint noise reduction and -norm minimization method is presented to enhance TDE in colored noise. An improved subspace method for colored noise reduction is first performed. Then the time delay is estimated by using an -norm minimization method. Because the clean speech signal form the noisy signal is well extracted by noise reduction and the -norm minimization method is robust, the TDE accuracy can be enhanced. Experiment results confirm that the proposed joint estimation method obtains more accurate TDE than several conventional algorithms in colored noise, especially in the case of low signal-to-noise ratio.
文摘Ambient noise in 13 sawinill’s workshops was tested and analyzed to reveal its character and regularity. The situation of the ambient noise in sawimill s workshops was evaluated.