This paper studies two kinds of methods for pitch predictor in speech compressing coding, i.e., open-loop and closed-loop structures. Some of simplified approaches for solving pitch predictor equation are suggested, a...This paper studies two kinds of methods for pitch predictor in speech compressing coding, i.e., open-loop and closed-loop structures. Some of simplified approaches for solving pitch predictor equation are suggested, and the performances are compared under several conditions. The computer simulation results are shown.展开更多
Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitud...Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.展开更多
The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic featur...The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.展开更多
Patients with severe hearing loss have the option to get a cochlear implant device to regain their hearing. Yet, the implantation process is not always optimal, which in some cases results in a shallow insertion depth...Patients with severe hearing loss have the option to get a cochlear implant device to regain their hearing. Yet, the implantation process is not always optimal, which in some cases results in a shallow insertion depth or an accidental insertion into the wrong cochlear duct. As a consequence, the patients' pitch discrimination ability is suboptimal, leading to an even more decreased vowel identification, which is vital for speech recognition. This paper presents a technical approach to solve this problem: the adaptive pitch transposition module modifies the frequency content in a fashion so that the pitch is fixed to an optimal value. To determine this value, a patient-individual best pitch is determined experimentally by evaluating speech recognition at different pitches. This best pitch is subsequently called the comfort pitch. As a result of the considerations a technical implementation is presented in principle. A system comprised of pitch detection, pitch transposition and an arbitrary chosen comfort pitch is described in depth. It has been implemented prototypically in Matlab/Octave and tested with an example audio file. The system?itself is designed as a preprocessing stage preceding cochlear implant processing.展开更多
This paper proposes an algorithm that adopts the harmonic regeneration as post-processing to improve the performance of speech enhancement using traditional Short Time Spectral Amplitude(STSA).The proposed algorithm a...This paper proposes an algorithm that adopts the harmonic regeneration as post-processing to improve the performance of speech enhancement using traditional Short Time Spectral Amplitude(STSA).The proposed algorithm aims to alleviate the distortion of the high harmonics of enhanced speech via the traditional STSA,and consequently improves the speech quality.We first detect the pitch,or fundamental frequency,of the enhanced speech via the traditional STSA,and then,divide the whole spectrum into multiple sub-bands which center on each harmonic.After that,a series of specially designed windows centered on each harmonic are applied to all the sub-bands,in order to redistribute the energy in the sub-bands.The results of experiment demonstrate that the method has both theo-retical and practical basis.展开更多
参考独立分量分析(independen t com ponen t ana lys is w ith reference,ICA-R)将源信号的先验知识以参考信号的形式引入学习算法中,可以从混合信号中仅抽取期望的源信号.基于ICA-R提出了一种语音增强新方法.通过比较语音信号和多种...参考独立分量分析(independen t com ponen t ana lys is w ith reference,ICA-R)将源信号的先验知识以参考信号的形式引入学习算法中,可以从混合信号中仅抽取期望的源信号.基于ICA-R提出了一种语音增强新方法.通过比较语音信号和多种噪声信号的特点,合理地构造了具有语音信号重要特性的参考信号,进而应用ICA-R从多种加性噪声中抽取了期望增强的语音信号.计算机仿真和性能分析结果均表明了该方法的有效性.展开更多
文摘This paper studies two kinds of methods for pitch predictor in speech compressing coding, i.e., open-loop and closed-loop structures. Some of simplified approaches for solving pitch predictor equation are suggested, and the performances are compared under several conditions. The computer simulation results are shown.
文摘Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.
文摘The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.
文摘Patients with severe hearing loss have the option to get a cochlear implant device to regain their hearing. Yet, the implantation process is not always optimal, which in some cases results in a shallow insertion depth or an accidental insertion into the wrong cochlear duct. As a consequence, the patients' pitch discrimination ability is suboptimal, leading to an even more decreased vowel identification, which is vital for speech recognition. This paper presents a technical approach to solve this problem: the adaptive pitch transposition module modifies the frequency content in a fashion so that the pitch is fixed to an optimal value. To determine this value, a patient-individual best pitch is determined experimentally by evaluating speech recognition at different pitches. This best pitch is subsequently called the comfort pitch. As a result of the considerations a technical implementation is presented in principle. A system comprised of pitch detection, pitch transposition and an arbitrary chosen comfort pitch is described in depth. It has been implemented prototypically in Matlab/Octave and tested with an example audio file. The system?itself is designed as a preprocessing stage preceding cochlear implant processing.
基金Supported by the National Natural Science Foundation of China (No. 60572081)
文摘This paper proposes an algorithm that adopts the harmonic regeneration as post-processing to improve the performance of speech enhancement using traditional Short Time Spectral Amplitude(STSA).The proposed algorithm aims to alleviate the distortion of the high harmonics of enhanced speech via the traditional STSA,and consequently improves the speech quality.We first detect the pitch,or fundamental frequency,of the enhanced speech via the traditional STSA,and then,divide the whole spectrum into multiple sub-bands which center on each harmonic.After that,a series of specially designed windows centered on each harmonic are applied to all the sub-bands,in order to redistribute the energy in the sub-bands.The results of experiment demonstrate that the method has both theo-retical and practical basis.
文摘参考独立分量分析(independen t com ponen t ana lys is w ith reference,ICA-R)将源信号的先验知识以参考信号的形式引入学习算法中,可以从混合信号中仅抽取期望的源信号.基于ICA-R提出了一种语音增强新方法.通过比较语音信号和多种噪声信号的特点,合理地构造了具有语音信号重要特性的参考信号,进而应用ICA-R从多种加性噪声中抽取了期望增强的语音信号.计算机仿真和性能分析结果均表明了该方法的有效性.