Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.T...Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.展开更多
Perceptual Objective Listening Quality Assessment (POLQA) and Perceptual <span>Evaluation of Speech Quality (PESQ) are commonly used objective standards for evaluating speech quality. These methods were develope...Perceptual Objective Listening Quality Assessment (POLQA) and Perceptual <span>Evaluation of Speech Quality (PESQ) are commonly used objective standards for evaluating speech quality. These methods were developed and trained on native </span>speakers’ speech sequences of some western languages. One can then wonder how these methods perform if they are applied to other languages or if the<span> speaker is non-native. This paper deals with the evaluation of PESQ and POLQA </span>on languages that were not been considered when setting up these methods, with emphasis on Moore and Dioula, two local languages of Burkina Faso. <span>Another aspect is the evaluation of these two methods in the case of non-native speakers. For this purpose, in the one hand, the Mean Opinion Score-Listening Quality Objective (MOS-LQO) of PESQ and POLQA, computed for Moore and Dioula, are compared to those of French and English. On the second hand, the </span><span>MOS-LQO scores of French and English are compared for native and</span><span> non-native speakers, to evaluate the effect of the accent of speakers.</span>展开更多
Based on fuzzy Gaussian mixture model (FGMM) and support vector regression (SVR),an improved version of non-intrusive objective measurement for assessing quality of output speech without inputting clean speech is ...Based on fuzzy Gaussian mixture model (FGMM) and support vector regression (SVR),an improved version of non-intrusive objective measurement for assessing quality of output speech without inputting clean speech is proposed for narrowband speech.Its perceptual linear predictive (PLP) features extracted from clean speech and clustered by FGMM are used as an artificial reference model.Input speech is separated into three classes,for each a consistency parameter between each feature pair from test speech signals and its counterpart in the pre-trained FGMM reference model is calculated and mapped to an objective speech quality score using SVR method.The correlation degree between subjective mean opinion score (MOS) and objective MOS is analyzed.Experimental results show that the proposed method offers an effective technique and can give better performances than the ITU-T P.563 method under most of the test conditions for narrowband speech.展开更多
Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is...Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is most affected by residual bit errors in received speech frames. Conventional methods use binary decision strategies for error detection and concealment in frames. This paper presents a multi-level error detection and concealment algorithm for GSM full rate speech codec systems. The algorithm uses multi-source knowledge to detect and conceal speech frame errors at the frame, parameter, and even bit levels. Tests show that most corrupted frames can be appropriately concealed by this algorithm, resulting in MOS gains of more than 50% for real-world data tests.展开更多
To promote the performance of the traditional multichannel filter bank which leads to speech quality degradation,an efficient design method of the non-uniform cosine modulated filter bank(CMFB) based on the audiogra...To promote the performance of the traditional multichannel filter bank which leads to speech quality degradation,an efficient design method of the non-uniform cosine modulated filter bank(CMFB) based on the audiogram for digital hearing aids is proposed. First, a low-pass prototype filter is designed by the linear iterative algorithm. Secondly,the uniform CMFB is achieved on the basis of the principle formulas. Then, the adjacent channels of a uniform filter bank which have low or gradual slopes are merged according to the trend of audiogram of the hearing impaired person. Finally,the corresponding non-uniform CMFB is obtained. Simulation results show that the signal processed by the proposed filter bank is similar to the original signal in a time-domain waveform and spectrogram without significant distortion or difference. The speech quality results show that the personal evaluation of speech quality(PESQ) of non-uniform CMFB is 35% higher than that of the traditional design, and the hearing-aid speech quality index(HASQI) increases by about 40%.展开更多
基金Projects(61001188,1161140319)supported by the National Natural Science Foundation of ChinaProject(2012ZX03001034)supported by the National Science and Technology Major ProjectProject(YETP1202)supported by Beijing Higher Education Young Elite Teacher Project,China
文摘Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
文摘Perceptual Objective Listening Quality Assessment (POLQA) and Perceptual <span>Evaluation of Speech Quality (PESQ) are commonly used objective standards for evaluating speech quality. These methods were developed and trained on native </span>speakers’ speech sequences of some western languages. One can then wonder how these methods perform if they are applied to other languages or if the<span> speaker is non-native. This paper deals with the evaluation of PESQ and POLQA </span>on languages that were not been considered when setting up these methods, with emphasis on Moore and Dioula, two local languages of Burkina Faso. <span>Another aspect is the evaluation of these two methods in the case of non-native speakers. For this purpose, in the one hand, the Mean Opinion Score-Listening Quality Objective (MOS-LQO) of PESQ and POLQA, computed for Moore and Dioula, are compared to those of French and English. On the second hand, the </span><span>MOS-LQO scores of French and English are compared for native and</span><span> non-native speakers, to evaluate the effect of the accent of speakers.</span>
文摘Based on fuzzy Gaussian mixture model (FGMM) and support vector regression (SVR),an improved version of non-intrusive objective measurement for assessing quality of output speech without inputting clean speech is proposed for narrowband speech.Its perceptual linear predictive (PLP) features extracted from clean speech and clustered by FGMM are used as an artificial reference model.Input speech is separated into three classes,for each a consistency parameter between each feature pair from test speech signals and its counterpart in the pre-trained FGMM reference model is calculated and mapped to an objective speech quality score using SVR method.The correlation degree between subjective mean opinion score (MOS) and objective MOS is analyzed.Experimental results show that the proposed method offers an effective technique and can give better performances than the ITU-T P.563 method under most of the test conditions for narrowband speech.
基金Supported by the National Natural Science Foundation of China andMicrosoft Research Asia (No.60776800)in part by the National High-Tech Research and Development Program (863) of China (Nos. 2006AA010101, 2007AA04Z223, 2008AA02Z414,and 2008AA040201)
文摘Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is most affected by residual bit errors in received speech frames. Conventional methods use binary decision strategies for error detection and concealment in frames. This paper presents a multi-level error detection and concealment algorithm for GSM full rate speech codec systems. The algorithm uses multi-source knowledge to detect and conceal speech frame errors at the frame, parameter, and even bit levels. Tests show that most corrupted frames can be appropriately concealed by this algorithm, resulting in MOS gains of more than 50% for real-world data tests.
基金The National Natural Science Foundation of China(No.61375028,61673108)China Postdoctoral Science Foundation(No.2016M601696)+2 种基金Qing Lan Projectthe Program for Special Talent in Six Fields of Jiangsu Province(No.2016-DZXX-023)Jiangsu Planned Projects for Postdoctoral Research Funds(No.1601011B)
文摘To promote the performance of the traditional multichannel filter bank which leads to speech quality degradation,an efficient design method of the non-uniform cosine modulated filter bank(CMFB) based on the audiogram for digital hearing aids is proposed. First, a low-pass prototype filter is designed by the linear iterative algorithm. Secondly,the uniform CMFB is achieved on the basis of the principle formulas. Then, the adjacent channels of a uniform filter bank which have low or gradual slopes are merged according to the trend of audiogram of the hearing impaired person. Finally,the corresponding non-uniform CMFB is obtained. Simulation results show that the signal processed by the proposed filter bank is similar to the original signal in a time-domain waveform and spectrogram without significant distortion or difference. The speech quality results show that the personal evaluation of speech quality(PESQ) of non-uniform CMFB is 35% higher than that of the traditional design, and the hearing-aid speech quality index(HASQI) increases by about 40%.