Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitud...Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.展开更多
The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic featur...The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.展开更多
Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for liste...Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for listeners and speakers to have mutual understanding. Identifying the pragmatic functions of prosodic features will facilitate the teaching of listening and speaking. English teachers need to clarify and emphasize the relationship between prosodic features and their pragmatic functions, attempting to work out how to combine them together into teaching in order to teach students to communicate effectively.展开更多
Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate...Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is different from the conventional vocoder. Different kinds of parameter are used for the preprocessing compared with the other parameters to t'md the more appropriate parameter for the vocoder. The Parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in the preprocessing step are speaking rate, duration, and PSOLA technique.展开更多
In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel c...In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.展开更多
Language, literature, customs and traditions, music and art are cultural items that were transmitted from generation to generation throughout history. In this context, literature is an important source of music cultur...Language, literature, customs and traditions, music and art are cultural items that were transmitted from generation to generation throughout history. In this context, literature is an important source of music culture that takes inspiration from the customs and traditions of a society. Prosodic meter is echoed in form, usul and general structure in works composed from the divan literature and almost lives in the work. In the same way, when examples of folk literature composed by composers and performed by poets and a^lks are examined, it is observed that there are parallels between literary features and form, structure and rhythmic features. The aim of this paper is to reveal the integral link between Melody-Usul and Meter in Ottoman Turkish Music展开更多
This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-stud...This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-study, anexperimental phonetic attempt to conduct a study on the pitch of trisyllabic prosodicwords in Mandarin monologue. In addition, taking the characteristics of Englishprosodic words into consideration, the paper makes a contrastive analysis of prosodicwords in English and Mandarin. This study finds that the pitch of trisyllabic prosodicwords in Mandarin is inevitably affected by structural factors. As far as the leftsyllable is concerned, the grammatical category, prosodic hierarchical boundary andthe position of the intonational phrase where the syllable is located, the mid syllableand the right syllable may have influences on the pitch contour of the left syllable.As to the mid syllable, the grammatical category, the left syllable, the right syllableand the position of the intonational phrase where the syllable is located may haveinfluences on the pitch contour of the mid syllable. As for the right syllable, theprosodic hierarchical boundary where the syllable is located and the mid syllable mayhave effects on the pitch contour of the right syllable. Different from the previousfindings of the study on read corpus, this study shows that the mid syllable not onlyhas dissimilatory effects but also has assimilatory effects on the pitch of its precedingsyllable. The left syllable has anticipatory effects on the onset pitch of the mid syllableand the right syllable has coarticulation effects on the offset pitch of the mid syllable.展开更多
To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially...To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.展开更多
诗歌生成中的韵律规范和主题一致性一直以来都是自然语言生成领域的研究热点。为提升诗歌生成中的韵律规范,提出了基于Transformer结合韵律特征的诗歌生成模型(Transformer and prosodic features poetry generation model,TPPG)。根据...诗歌生成中的韵律规范和主题一致性一直以来都是自然语言生成领域的研究热点。为提升诗歌生成中的韵律规范,提出了基于Transformer结合韵律特征的诗歌生成模型(Transformer and prosodic features poetry generation model,TPPG)。根据韵律特征建立平仄韵律词库和平声韵脚词库,在Transformer编码器中引入平仄韵律编码,模型训练过程中可以捕获更多平仄韵律特征的信息,学习到多种诗歌韵律;最终根据建立的平声韵脚词库规范诗歌生成韵脚,运用极大后验概率对于候选的诗歌选择当前赋有韵律特征规范的最优诗句,整体提升诗歌规范性和流畅性。实验结果表明TPPG模型生成的诗歌能够很好地符合韵律,在人工评价和机器评价中均有提高。展开更多
文摘Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.
文摘The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.
文摘Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for listeners and speakers to have mutual understanding. Identifying the pragmatic functions of prosodic features will facilitate the teaching of listening and speaking. English teachers need to clarify and emphasize the relationship between prosodic features and their pragmatic functions, attempting to work out how to combine them together into teaching in order to teach students to communicate effectively.
基金supported by the Brain Korea 21 Project in 2010,and the MKE(The Ministry of Knowledge Economy,Korea)the ITRC(Information Technology Research Center)support program(NIPA-2010-(C1090-1021-0010))
文摘Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is different from the conventional vocoder. Different kinds of parameter are used for the preprocessing compared with the other parameters to t'md the more appropriate parameter for the vocoder. The Parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in the preprocessing step are speaking rate, duration, and PSOLA technique.
文摘In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.
文摘Language, literature, customs and traditions, music and art are cultural items that were transmitted from generation to generation throughout history. In this context, literature is an important source of music culture that takes inspiration from the customs and traditions of a society. Prosodic meter is echoed in form, usul and general structure in works composed from the divan literature and almost lives in the work. In the same way, when examples of folk literature composed by composers and performed by poets and a^lks are examined, it is observed that there are parallels between literary features and form, structure and rhythmic features. The aim of this paper is to reveal the integral link between Melody-Usul and Meter in Ottoman Turkish Music
文摘This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-study, anexperimental phonetic attempt to conduct a study on the pitch of trisyllabic prosodicwords in Mandarin monologue. In addition, taking the characteristics of Englishprosodic words into consideration, the paper makes a contrastive analysis of prosodicwords in English and Mandarin. This study finds that the pitch of trisyllabic prosodicwords in Mandarin is inevitably affected by structural factors. As far as the leftsyllable is concerned, the grammatical category, prosodic hierarchical boundary andthe position of the intonational phrase where the syllable is located, the mid syllableand the right syllable may have influences on the pitch contour of the left syllable.As to the mid syllable, the grammatical category, the left syllable, the right syllableand the position of the intonational phrase where the syllable is located may haveinfluences on the pitch contour of the mid syllable. As for the right syllable, theprosodic hierarchical boundary where the syllable is located and the mid syllable mayhave effects on the pitch contour of the right syllable. Different from the previousfindings of the study on read corpus, this study shows that the mid syllable not onlyhas dissimilatory effects but also has assimilatory effects on the pitch of its precedingsyllable. The left syllable has anticipatory effects on the onset pitch of the mid syllableand the right syllable has coarticulation effects on the offset pitch of the mid syllable.
文摘To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.
文摘诗歌生成中的韵律规范和主题一致性一直以来都是自然语言生成领域的研究热点。为提升诗歌生成中的韵律规范,提出了基于Transformer结合韵律特征的诗歌生成模型(Transformer and prosodic features poetry generation model,TPPG)。根据韵律特征建立平仄韵律词库和平声韵脚词库,在Transformer编码器中引入平仄韵律编码,模型训练过程中可以捕获更多平仄韵律特征的信息,学习到多种诗歌韵律;最终根据建立的平声韵脚词库规范诗歌生成韵脚,运用极大后验概率对于候选的诗歌选择当前赋有韵律特征规范的最优诗句,整体提升诗歌规范性和流畅性。实验结果表明TPPG模型生成的诗歌能够很好地符合韵律,在人工评价和机器评价中均有提高。