Conversion refers to the transformation of parts of speech in some words while maintaining the original content unchanged in order to make the translated text sound smooth and fluent as well as more idiomatic in the t...Conversion refers to the transformation of parts of speech in some words while maintaining the original content unchanged in order to make the translated text sound smooth and fluent as well as more idiomatic in the target language. In E-C translation, conversion of pars of speech is one of the most important translation methods. Several different forms about conversion of parts of speech are introduced through analyzing the differences and usages between the two languages so that a better version in E-C translation can be got.展开更多
Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of t...Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.展开更多
A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency ba...A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.展开更多
文摘Conversion refers to the transformation of parts of speech in some words while maintaining the original content unchanged in order to make the translated text sound smooth and fluent as well as more idiomatic in the target language. In E-C translation, conversion of pars of speech is one of the most important translation methods. Several different forms about conversion of parts of speech are introduced through analyzing the differences and usages between the two languages so that a better version in E-C translation can be got.
基金supported by the National Natural Science Foundation of China(61071215)the Science and Technology Foundation of Suzhou(SYG201033)the Pre-research Foundation of Soochow University(Q311901111,14317399)
文摘Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.
基金supported by the National Natural Science Foundation of China(61271359,61071215)Suzhou Science and Technology Development Plan(SYG201001)Key Joint Laboratory of Soochow University and JieMei Biomedical Engineering Instrument
文摘A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.