A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using ...A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using speech intelligibility tests. 16 college students (8 male, 8 female) with no experience with synthetic speech were the listeners, they were asked to do open response task by pencilpaper. In addition, speech naturalness was mea-sured by Mean Opinion展开更多
A new speech synthesis algorithm based on the LMA filter in Chinese text-to-speech systern is introduced. Using this method, the system can not only generate speech with higher quality, but also have a more powerful ...A new speech synthesis algorithm based on the LMA filter in Chinese text-to-speech systern is introduced. Using this method, the system can not only generate speech with higher quality, but also have a more powerful ability to modify the prosodic parameters, which ensures a far more natural and intelligible synthesized speech than ever before. First, the fundamental principles of the LMA filter and the construction of the synthesizer are presented, then, how to modify the acoustic parameters with this synthesizer is described; finally, the quantitative evaluation of the system's performance is shown while compared with a relatively successful PSOLA synthesizer KDTALK_1展开更多
A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency ba...A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.展开更多
A method to synthesize formant targeted sounds based on speech production model and Reflection-Type Line Analog (RTLA) articulatory synthesis model is presented. The synthesis model is implemented with scattering pro...A method to synthesize formant targeted sounds based on speech production model and Reflection-Type Line Analog (RTLA) articulatory synthesis model is presented. The synthesis model is implemented with scattering process derived from a RTLA of vocal tract system according to the acoustic mechanism of speech production. The vocal-tract area function which controls the synthesis model is derived from the first three formant trajectories by using the inverse solution of speech production. The proposed method not only gives good naturalness and dynamic smoothness, but also is capable to control or modify speech timbres easily and flexibly. Further and mores it needs less number of control parameters and very low update rate of the parameters.展开更多
Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency...Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency Scaling model. Whispered speech is filtered by the proposed model. Meanwhile, the value of masking threshold for each frequency band is dynamically determined by speech absence probability. Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values. Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods; the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value.展开更多
A method of drawing color spectrogram of speech by using microcomputer is described in this paper , and referred to the metod of drawing spectrogram by computer . With the software and no addition any other aqripment....A method of drawing color spectrogram of speech by using microcomputer is described in this paper , and referred to the metod of drawing spectrogram by computer . With the software and no addition any other aqripment., we can draw color three - dimension spectrogram ( or black -white spectrogram without color monitor ), and it is similar to spectrogram of sonagrapher .展开更多
The research on finding the arrival directions of speech signals by microphone arrny is proposed. We first analyze the uniform microphone array and give the design for microphone array applied in the hand-free speech ...The research on finding the arrival directions of speech signals by microphone arrny is proposed. We first analyze the uniform microphone array and give the design for microphone array applied in the hand-free speech recognition. Combining the traditional direction finding technique of MUltiple SIgnal Classification (MUSIC) with the focusing matrix method, we improve the resolving power of the microphone array for multiple speech sources.As one application of finding Direction of Arrival (DOA), a new microphone-array system for noise reduction is proposed. The new system is based on maximum likelihood estimate technique which reconstruct superimposed signals from different directions by using DOA information. The DOA information is got in terms of focusing MUSIC method which has been proven to have high performance than conventional MUSIC method on speaker localization[1].展开更多
文摘A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using speech intelligibility tests. 16 college students (8 male, 8 female) with no experience with synthetic speech were the listeners, they were asked to do open response task by pencilpaper. In addition, speech naturalness was mea-sured by Mean Opinion
文摘A new speech synthesis algorithm based on the LMA filter in Chinese text-to-speech systern is introduced. Using this method, the system can not only generate speech with higher quality, but also have a more powerful ability to modify the prosodic parameters, which ensures a far more natural and intelligible synthesized speech than ever before. First, the fundamental principles of the LMA filter and the construction of the synthesizer are presented, then, how to modify the acoustic parameters with this synthesizer is described; finally, the quantitative evaluation of the system's performance is shown while compared with a relatively successful PSOLA synthesizer KDTALK_1
基金supported by the National Natural Science Foundation of China(61271359,61071215)Suzhou Science and Technology Development Plan(SYG201001)Key Joint Laboratory of Soochow University and JieMei Biomedical Engineering Instrument
文摘A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.
基金This work is supported by National Natural Science Foundation of China !(69972046)the NSF of Zhejiang Province! (698076)
文摘A method to synthesize formant targeted sounds based on speech production model and Reflection-Type Line Analog (RTLA) articulatory synthesis model is presented. The synthesis model is implemented with scattering process derived from a RTLA of vocal tract system according to the acoustic mechanism of speech production. The vocal-tract area function which controls the synthesis model is derived from the first three formant trajectories by using the inverse solution of speech production. The proposed method not only gives good naturalness and dynamic smoothness, but also is capable to control or modify speech timbres easily and flexibly. Further and mores it needs less number of control parameters and very low update rate of the parameters.
基金supported by the National Natural Science Foundation of China(61071215)the University Natural Science Research Project of Jiangsu Province(05KJB510113)
文摘Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency Scaling model. Whispered speech is filtered by the proposed model. Meanwhile, the value of masking threshold for each frequency band is dynamically determined by speech absence probability. Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values. Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods; the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value.
文摘A method of drawing color spectrogram of speech by using microcomputer is described in this paper , and referred to the metod of drawing spectrogram by computer . With the software and no addition any other aqripment., we can draw color three - dimension spectrogram ( or black -white spectrogram without color monitor ), and it is similar to spectrogram of sonagrapher .
文摘The research on finding the arrival directions of speech signals by microphone arrny is proposed. We first analyze the uniform microphone array and give the design for microphone array applied in the hand-free speech recognition. Combining the traditional direction finding technique of MUltiple SIgnal Classification (MUSIC) with the focusing matrix method, we improve the resolving power of the microphone array for multiple speech sources.As one application of finding Direction of Arrival (DOA), a new microphone-array system for noise reduction is proposed. The new system is based on maximum likelihood estimate technique which reconstruct superimposed signals from different directions by using DOA information. The DOA information is got in terms of focusing MUSIC method which has been proven to have high performance than conventional MUSIC method on speaker localization[1].