Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitud...Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.展开更多
The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic featur...The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.展开更多
Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for liste...Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for listeners and speakers to have mutual understanding. Identifying the pragmatic functions of prosodic features will facilitate the teaching of listening and speaking. English teachers need to clarify and emphasize the relationship between prosodic features and their pragmatic functions, attempting to work out how to combine them together into teaching in order to teach students to communicate effectively.展开更多
This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-stud...This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-study, anexperimental phonetic attempt to conduct a study on the pitch of trisyllabic prosodicwords in Mandarin monologue. In addition, taking the characteristics of Englishprosodic words into consideration, the paper makes a contrastive analysis of prosodicwords in English and Mandarin. This study finds that the pitch of trisyllabic prosodicwords in Mandarin is inevitably affected by structural factors. As far as the leftsyllable is concerned, the grammatical category, prosodic hierarchical boundary andthe position of the intonational phrase where the syllable is located, the mid syllableand the right syllable may have influences on the pitch contour of the left syllable.As to the mid syllable, the grammatical category, the left syllable, the right syllableand the position of the intonational phrase where the syllable is located may haveinfluences on the pitch contour of the mid syllable. As for the right syllable, theprosodic hierarchical boundary where the syllable is located and the mid syllable mayhave effects on the pitch contour of the right syllable. Different from the previousfindings of the study on read corpus, this study shows that the mid syllable not onlyhas dissimilatory effects but also has assimilatory effects on the pitch of its precedingsyllable. The left syllable has anticipatory effects on the onset pitch of the mid syllableand the right syllable has coarticulation effects on the offset pitch of the mid syllable.展开更多
In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel c...In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.展开更多
To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially...To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.展开更多
This study presents evidence from analyses of the acoustic parameters of fluent continuous speech to show that within-paragraph prosodic phrase boundaries are related more to contrasts of neighborhood prosodic states ...This study presents evidence from analyses of the acoustic parameters of fluent continuous speech to show that within-paragraph prosodic phrase boundaries are related more to contrasts of neighborhood prosodic states rather than between-phrase pause durations; prosodic states receive more constraints from higher level discourse information. By revising a modular acoustic model by Tseng's hierarchical prosodic phrase grouping framework and examining the much varied prosodic phrase (PPh) boundary B3 within speech paragraph, we show that statistical accounts of layered contributions reveal distinct contrasts between boundary immediate duration and intensity patterns irrespective of pause duration. Contrasts of FO contour patterns were also observed in these locations. Evidence was also obtained to illustrate how PPh boundary states are specified more by higher level discourse information than by lower level prosodic word construction. These combined results suggest that contrastive neighboring prosodic states are more significant cues to PPh boundaries than boundary pause duration. The results also help explain why in fluent speech between-phrase pause durations vary greatly, and can be applied to automatic speech segmentation.展开更多
Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contri...Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contributions of the paper are two aspects. One is that we use classifier combination method to detect Mandarin and English prosodic break using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus Annotated Speech Corpus of Chinese Discourse and the English prosodic annotation corpus -- Boston University Radio News Corpus when compared with the baseline system and other researches' experimental results. The other is the feature analysis for prosodic break detection. The functions of different features, such as duration, pitch, energy, and intensity, are analyzed and compared in Mandarin and English prosodic break detection. Based on the feature analysis, we also verify some linguistic conclusions.展开更多
Putonghua prosody is characterized by its hierarchical structure when influenced by linguistic environments. Based on this, a neural network, with specially weighted factors and optimizing outputs, is described and ap...Putonghua prosody is characterized by its hierarchical structure when influenced by linguistic environments. Based on this, a neural network, with specially weighted factors and optimizing outputs, is described and applied to construct the Putonghua prosodic model in Text-to-Speech (TTS) system. Extensive tests show that the structure of the neural network characterizes the Putonghua prosody more exactly than traditional models. Learning rate is speeded up and computational precision is improved, which makes the whole prosodic model more efficient. Furthermore, the paper also stylizes the Putonghua syllable pitch contours with SPiS parameters (Syllable Pitch Stylized Parameters), and analyzes them in adjusting the syllable pitch. It shows that the SPiS parameters effectively characterize the Putonghua syllable pitch contours, and facilitate the establishment of the network model and the prosodic controlling.展开更多
This paper is a study of how speech rate (normal, fast and slow) influences temporal and tonal patterns in Standard Chinese. The main effect of a shift of tempo from slow to fast is a compression and an upward movemen...This paper is a study of how speech rate (normal, fast and slow) influences temporal and tonal patterns in Standard Chinese. The main effect of a shift of tempo from slow to fast is a compression and an upward movement of the overall pitch range while the number of turning points and their positions relative to the segments are well retained. In the time domain there is a lengthening of about 50% from normal to slow and a shortening of about 25% from normal to fast speech. This compression is not uniform. For the higher tempo the last constituent increases in relative duration and prominence at the expense of the segments of the first constituent. The larger number of lexically pitch-determined syllables in a Chinese sentence makes the prosodic patterns of Chinese differ from some European languages.展开更多
This study explored how native speakers utilize intonation to produce French clause-combining complexes with causal conjunctions,particularly investigating how the prosodic realization would be affected by the narrati...This study explored how native speakers utilize intonation to produce French clause-combining complexes with causal conjunctions,particularly investigating how the prosodic realization would be affected by the narrative order of the cause and effect event,which conforms or conflicts with the iconic reasoning order,in a conversation with projected focus.Ten native French speakers were recruited to read aloud 68 question-answer pairs.The critical answer conveys volitional content causality consisting of a prior clause combined with a causal/consequence clause introduced by the conjunction car or donc,forming effect-cause(EC)or cause-effect(CE)order,respectively.It responds to either a why-question or a general question so that the focus position is manipulated.Results of clausal boundary intonation and the prosodic prominence placement were convergent:EC order and focus in the second clause increased uses of continuing boundary intonation and prominence on the second clause as compared with CE order and focus in the prior clause,as both factors showed main effects.Our finding is not supportive to the cognitive account predicting prosodic dissociation for non-iconic order;instead,it may shed light on the critical role of prosody in marking causality by highlighting the influence of contextualization cues.展开更多
Correct prosodic boundary prediction is crucial for the quality of synthesized speech in text-to-speech system. This article mainly presents the prosodic hierarchy of Uyghur language, which belongs to Turkish language...Correct prosodic boundary prediction is crucial for the quality of synthesized speech in text-to-speech system. This article mainly presents the prosodic hierarchy of Uyghur language, which belongs to Turkish language family of Altaic language system and further verifies the reliability of proposed Uyghur prosodic boundary annotation rules by acoustic analysis. In the prediction part, a two-layer shifting hierarchical approach based on decision tree is used for predicting prosodic word and prosodic phrase boundary, and the influence of different feature sets on the Uyghur prosodic boundary prediction is also investigated. Experimental results clearly show the acoustical changes and automatic prediction performance of different prosodic boundaries of Uyghur language, thus laying a good foundation for further research.展开更多
The effects of prosodic phrase(PP)boundary on the pitch lowering of downstep and focus,as well as the domains of them were investigated in Chinese Putonghua,by using designed sentences which consist of two prosodic ph...The effects of prosodic phrase(PP)boundary on the pitch lowering of downstep and focus,as well as the domains of them were investigated in Chinese Putonghua,by using designed sentences which consist of two prosodic phrases(i.e.,PP1,PP2).The results showed that:(1)The PP boundary blocked the downstep effect in the preceding phrase,indicating that PP is the domain of downstep.(2)The post-focus F0 lowering effect in PP1 spread across the PP boundary and lower the FO contour of PP2.If there is a downstep effect in PP2,the postboundary compression effect of the prior focus will accumulate with the downstep,producing further lowered contour.Therefore,the domain of focus is an intonational phrase(IP).(3)When there is one contrastive focus in each phrase,the outstanding pitch reset elicited by the second focus will block the FO lowering effect of PP1 onto PP2,and the two foci are realized independently.展开更多
Prosodic control is an important part of speech synthesis system. Prosodic parameters choice right or wrong influences the quality of synthetic speech directly. At present, text to speech system has less effective des...Prosodic control is an important part of speech synthesis system. Prosodic parameters choice right or wrong influences the quality of synthetic speech directly. At present, text to speech system has less effective describe to reflect data relationships in the corpus. A new research approach - data mining technology to discover those relationships by association rules modeling is presented. And a new algorithm for generating association rules of prosodic parameters including pitch parameters and duration parameters from corpus is developed. The output rules improve the correctness of syllable choice in text to speech system.展开更多
The perceptual representation of the prosodic structure of Chinese sentences was constructed statistically by using the method of multidimensional scaling analysis on the basis of the result of a discrimination experi...The perceptual representation of the prosodic structure of Chinese sentences was constructed statistically by using the method of multidimensional scaling analysis on the basis of the result of a discrimination experiment, in which listeners were asked to compare perceptual distances between two adjacent syllables in each of six sentences. Listeners' ability to resolve levels of prosodic hierarchy and the relationship between the prosodic and syntactic structures were discussed in relation to perceptual representations.展开更多
Sermon as a religious discourse thrives on the use of prosodic features which give extra information to enhance understanding of utterances in oral discourse. This paper investigates the prosodic innovations that mani...Sermon as a religious discourse thrives on the use of prosodic features which give extra information to enhance understanding of utterances in oral discourse. This paper investigates the prosodic innovations that manifest in the sermons of selected Pentecostal pastors in Southwest Nigeria. It also identifies some prosodic features in the selected sermons and relates them to their themes. Data was sourced from one sermon-VCD tape each by Pastors Kumuyi(Inf-Kum), Adeboye(Inf-Adeb) and Bishop Oyedepo(Inf-Oyed). The tapes were played back to extract some prosodic features in the sermons;these were analyzed using the pitch extraction software, PRAAT. Analyses of the data reveal that preachers put extra prosodic force on some words in order to emphasize the focus of their messages. But while Inf-Oyed and Inf-Adeb use higher pitch for accented words and lower pitch for non-accented, Inf-Kum uses the same pitch of voice for both. Additionally, the subjects’ vowel length(SV-L) rendition appeared longer than what obtains in native English. Moreover, there are specific prosodic features characterizing each preacher’s doctrinal persuasions: Inf-Oyed deploys emphatic stress with enthusiastic voice;Info-Kum uses a relatively same level of voice pitch while Info-Adeb’s renditions are generally slower in tempo. The paper concludes that Nigerian Pentecostal sermons are replete with prosodic features deployed for achieving thematization of messages and doctrinal identity construction.展开更多
According to Register Grammar,prosody,as an aspect of grammar,is one way to realize different registers.This study explored the differences in the acoustic features of prosodic boundaries between Chinese formal and in...According to Register Grammar,prosody,as an aspect of grammar,is one way to realize different registers.This study explored the differences in the acoustic features of prosodic boundaries between Chinese formal and informal speech.Results suggested that:(1) Pauses occurred more frequently and lasted longer at prosodic boundaries in formal speech,best reflected at the Prosodic Clitic level and at the Prosodic Phrase level respectively.In formal speech,pauses at Prosodic Phrase boundaries lasted significantly longer than those at Prosodic Clitic boundaries,while this difference was not significant in informal speech.The distribution of pause duration displayed greater dispersion as the prosodic level increased.(2) In informal register,Prosodic Phrase boundaries performed higher degrees of pre-lengthening than Prosodic Clitic boundaries,while this difference was not significant in formal speech.Prosodic Clitic boundaries in formal and informal speech displayed pre-lengthening and postlengthening,respectively.(3) Pre-strengthening in the intensity of prosodic words at prosodic boundaries existed at all three levels in both registers,but it was probably a weak cue to discriminate the two registers.(4) Only slight pitch reset was found at Prosodic Clitic boundaries in formal speech and at Prosodic Phrase boundaries in informal speech.展开更多
People spend most of their time communicating their thoughts, ideas, attitudes, and emotions on social media platforms like Twitter, however, an important mode of communication as the nonverbal component which require...People spend most of their time communicating their thoughts, ideas, attitudes, and emotions on social media platforms like Twitter, however, an important mode of communication as the nonverbal component which requires visual and audible cues is not allowed due to the nature of these text-based platforms. The aim of this research is to discover the alternative ways Arabs use across different dialects to compensate for the absence of the nonverbal component. To be able to discover that the researchers collected a corpus of tweets written in the Arabic language by using python through the Twitter application programming interface (API). The results can be summed up as follows: emojis helped Arabs to communicate their facial expressions and the top used emoji across the different dialects was Face with Tears of Joy, it was also apparent that the top used emojis reflected the universal emotions, regarding the usage of hand gestures, Egyptian dialect came in the first place and Emirati dialect in the second place. Prosodic features such as the tone and loudness of the voice are expressed by the mean of character repetition, Punctuation usage across the Arabic dialects was limited, and Lebanese seemed to use them the most, Arabs tend to replace punctuation marks with emojis, finally, Arabs used vocal expressions like Interjections to communicate their affective state.展开更多
The present paper explores the special behavior of geminate consonants in Moroccan Arabic vis-á-vis short consonants and consonant clusters.By way of comparison,it is shown that geminates exhibit properties that ...The present paper explores the special behavior of geminate consonants in Moroccan Arabic vis-á-vis short consonants and consonant clusters.By way of comparison,it is shown that geminates exhibit properties that are reminiscent of both unit structures and cluster structures.In particular,we reveal that geminates in MA demonstrate an inconsistent behavior in relation to the process of schwa epenthesis.In this context,we ask whether geminates get split up in MA,and when and how that happens.In order to characterize the patterning of geminates in MA,different phonological representations of geminates are examined against the variable behavior of geminates.On this basis,it is eventually suggested that geminates should be depicted as two root nodes that are underlyingly associated with a mora at the prosodic level.展开更多
Communicative dynamism is of great importance in the field of linguistics, which would help a lot to extract the useful experience for the numerous fields as the educational field. In order to make full advantage of t...Communicative dynamism is of great importance in the field of linguistics, which would help a lot to extract the useful experience for the numerous fields as the educational field. In order to make full advantage of the communicative dynamism,it is imperative to find out the details. Therefore, a summary should be made so as to get a better understanding of the discourse analysis. In the summary, the distribution of communicative dynamism and prosodic prominence would be discussed together with the related details.展开更多
文摘Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.
文摘The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.
文摘Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for listeners and speakers to have mutual understanding. Identifying the pragmatic functions of prosodic features will facilitate the teaching of listening and speaking. English teachers need to clarify and emphasize the relationship between prosodic features and their pragmatic functions, attempting to work out how to combine them together into teaching in order to teach students to communicate effectively.
文摘This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-study, anexperimental phonetic attempt to conduct a study on the pitch of trisyllabic prosodicwords in Mandarin monologue. In addition, taking the characteristics of Englishprosodic words into consideration, the paper makes a contrastive analysis of prosodicwords in English and Mandarin. This study finds that the pitch of trisyllabic prosodicwords in Mandarin is inevitably affected by structural factors. As far as the leftsyllable is concerned, the grammatical category, prosodic hierarchical boundary andthe position of the intonational phrase where the syllable is located, the mid syllableand the right syllable may have influences on the pitch contour of the left syllable.As to the mid syllable, the grammatical category, the left syllable, the right syllableand the position of the intonational phrase where the syllable is located may haveinfluences on the pitch contour of the mid syllable. As for the right syllable, theprosodic hierarchical boundary where the syllable is located and the mid syllable mayhave effects on the pitch contour of the right syllable. Different from the previousfindings of the study on read corpus, this study shows that the mid syllable not onlyhas dissimilatory effects but also has assimilatory effects on the pitch of its precedingsyllable. The left syllable has anticipatory effects on the onset pitch of the mid syllableand the right syllable has coarticulation effects on the offset pitch of the mid syllable.
文摘In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.
文摘To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.
文摘This study presents evidence from analyses of the acoustic parameters of fluent continuous speech to show that within-paragraph prosodic phrase boundaries are related more to contrasts of neighborhood prosodic states rather than between-phrase pause durations; prosodic states receive more constraints from higher level discourse information. By revising a modular acoustic model by Tseng's hierarchical prosodic phrase grouping framework and examining the much varied prosodic phrase (PPh) boundary B3 within speech paragraph, we show that statistical accounts of layered contributions reveal distinct contrasts between boundary immediate duration and intensity patterns irrespective of pause duration. Contrasts of FO contour patterns were also observed in these locations. Evidence was also obtained to illustrate how PPh boundary states are specified more by higher level discourse information than by lower level prosodic word construction. These combined results suggest that contrastive neighboring prosodic states are more significant cues to PPh boundaries than boundary pause duration. The results also help explain why in fluent speech between-phrase pause durations vary greatly, and can be applied to automatic speech segmentation.
基金Supported by the National Natural Science Foundation of China under Grant Nos. 90820303,90820011the Natural Science Foundation of Shandong Province of China under Grant No. ZR2011FQ024
文摘Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contributions of the paper are two aspects. One is that we use classifier combination method to detect Mandarin and English prosodic break using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus Annotated Speech Corpus of Chinese Discourse and the English prosodic annotation corpus -- Boston University Radio News Corpus when compared with the baseline system and other researches' experimental results. The other is the feature analysis for prosodic break detection. The functions of different features, such as duration, pitch, energy, and intensity, are analyzed and compared in Mandarin and English prosodic break detection. Based on the feature analysis, we also verify some linguistic conclusions.
基金This work was supported by the National Natural Science Foundation of China (69875008) and 863National High Technology Project
文摘Putonghua prosody is characterized by its hierarchical structure when influenced by linguistic environments. Based on this, a neural network, with specially weighted factors and optimizing outputs, is described and applied to construct the Putonghua prosodic model in Text-to-Speech (TTS) system. Extensive tests show that the structure of the neural network characterizes the Putonghua prosody more exactly than traditional models. Learning rate is speeded up and computational precision is improved, which makes the whole prosodic model more efficient. Furthermore, the paper also stylizes the Putonghua syllable pitch contours with SPiS parameters (Syllable Pitch Stylized Parameters), and analyzes them in adjusting the syllable pitch. It shows that the SPiS parameters effectively characterize the Putonghua syllable pitch contours, and facilitate the establishment of the network model and the prosodic controlling.
文摘This paper is a study of how speech rate (normal, fast and slow) influences temporal and tonal patterns in Standard Chinese. The main effect of a shift of tempo from slow to fast is a compression and an upward movement of the overall pitch range while the number of turning points and their positions relative to the segments are well retained. In the time domain there is a lengthening of about 50% from normal to slow and a shortening of about 25% from normal to fast speech. This compression is not uniform. For the higher tempo the last constituent increases in relative duration and prominence at the expense of the segments of the first constituent. The larger number of lexically pitch-determined syllables in a Chinese sentence makes the prosodic patterns of Chinese differ from some European languages.
基金supported by CASS Innovation ProgramCASS Innovation Program for young scholars
文摘This study explored how native speakers utilize intonation to produce French clause-combining complexes with causal conjunctions,particularly investigating how the prosodic realization would be affected by the narrative order of the cause and effect event,which conforms or conflicts with the iconic reasoning order,in a conversation with projected focus.Ten native French speakers were recruited to read aloud 68 question-answer pairs.The critical answer conveys volitional content causality consisting of a prior clause combined with a causal/consequence clause introduced by the conjunction car or donc,forming effect-cause(EC)or cause-effect(CE)order,respectively.It responds to either a why-question or a general question so that the focus position is manipulated.Results of clausal boundary intonation and the prosodic prominence placement were convergent:EC order and focus in the second clause increased uses of continuing boundary intonation and prominence on the second clause as compared with CE order and focus in the prior clause,as both factors showed main effects.Our finding is not supportive to the cognitive account predicting prosodic dissociation for non-iconic order;instead,it may shed light on the critical role of prosody in marking causality by highlighting the influence of contextualization cues.
基金Supported by the National Natural Science Foundation of China(61065005and61062008)
文摘Correct prosodic boundary prediction is crucial for the quality of synthesized speech in text-to-speech system. This article mainly presents the prosodic hierarchy of Uyghur language, which belongs to Turkish language family of Altaic language system and further verifies the reliability of proposed Uyghur prosodic boundary annotation rules by acoustic analysis. In the prediction part, a two-layer shifting hierarchical approach based on decision tree is used for predicting prosodic word and prosodic phrase boundary, and the influence of different feature sets on the Uyghur prosodic boundary prediction is also investigated. Experimental results clearly show the acoustical changes and automatic prediction performance of different prosodic boundaries of Uyghur language, thus laying a good foundation for further research.
基金supported by the Capacity Building for Sci-Tech Innovation-Fundamental Scientific Research Funds(025185305000/114)
文摘The effects of prosodic phrase(PP)boundary on the pitch lowering of downstep and focus,as well as the domains of them were investigated in Chinese Putonghua,by using designed sentences which consist of two prosodic phrases(i.e.,PP1,PP2).The results showed that:(1)The PP boundary blocked the downstep effect in the preceding phrase,indicating that PP is the domain of downstep.(2)The post-focus F0 lowering effect in PP1 spread across the PP boundary and lower the FO contour of PP2.If there is a downstep effect in PP2,the postboundary compression effect of the prior focus will accumulate with the downstep,producing further lowered contour.Therefore,the domain of focus is an intonational phrase(IP).(3)When there is one contrastive focus in each phrase,the outstanding pitch reset elicited by the second focus will block the FO lowering effect of PP1 onto PP2,and the two foci are realized independently.
基金This work was supported by the 863 National High Technology Project and the National Natural Science Foundation of China (No. 60275014).
文摘Prosodic control is an important part of speech synthesis system. Prosodic parameters choice right or wrong influences the quality of synthetic speech directly. At present, text to speech system has less effective describe to reflect data relationships in the corpus. A new research approach - data mining technology to discover those relationships by association rules modeling is presented. And a new algorithm for generating association rules of prosodic parameters including pitch parameters and duration parameters from corpus is developed. The output rules improve the correctness of syllable choice in text to speech system.
文摘The perceptual representation of the prosodic structure of Chinese sentences was constructed statistically by using the method of multidimensional scaling analysis on the basis of the result of a discrimination experiment, in which listeners were asked to compare perceptual distances between two adjacent syllables in each of six sentences. Listeners' ability to resolve levels of prosodic hierarchy and the relationship between the prosodic and syntactic structures were discussed in relation to perceptual representations.
文摘Sermon as a religious discourse thrives on the use of prosodic features which give extra information to enhance understanding of utterances in oral discourse. This paper investigates the prosodic innovations that manifest in the sermons of selected Pentecostal pastors in Southwest Nigeria. It also identifies some prosodic features in the selected sermons and relates them to their themes. Data was sourced from one sermon-VCD tape each by Pastors Kumuyi(Inf-Kum), Adeboye(Inf-Adeb) and Bishop Oyedepo(Inf-Oyed). The tapes were played back to extract some prosodic features in the sermons;these were analyzed using the pitch extraction software, PRAAT. Analyses of the data reveal that preachers put extra prosodic force on some words in order to emphasize the focus of their messages. But while Inf-Oyed and Inf-Adeb use higher pitch for accented words and lower pitch for non-accented, Inf-Kum uses the same pitch of voice for both. Additionally, the subjects’ vowel length(SV-L) rendition appeared longer than what obtains in native English. Moreover, there are specific prosodic features characterizing each preacher’s doctrinal persuasions: Inf-Oyed deploys emphatic stress with enthusiastic voice;Info-Kum uses a relatively same level of voice pitch while Info-Adeb’s renditions are generally slower in tempo. The paper concludes that Nigerian Pentecostal sermons are replete with prosodic features deployed for achieving thematization of messages and doctrinal identity construction.
基金supported by Social Science Foundation of Tianjin,China (TJWW19-009 and TJWW17-010)
文摘According to Register Grammar,prosody,as an aspect of grammar,is one way to realize different registers.This study explored the differences in the acoustic features of prosodic boundaries between Chinese formal and informal speech.Results suggested that:(1) Pauses occurred more frequently and lasted longer at prosodic boundaries in formal speech,best reflected at the Prosodic Clitic level and at the Prosodic Phrase level respectively.In formal speech,pauses at Prosodic Phrase boundaries lasted significantly longer than those at Prosodic Clitic boundaries,while this difference was not significant in informal speech.The distribution of pause duration displayed greater dispersion as the prosodic level increased.(2) In informal register,Prosodic Phrase boundaries performed higher degrees of pre-lengthening than Prosodic Clitic boundaries,while this difference was not significant in formal speech.Prosodic Clitic boundaries in formal and informal speech displayed pre-lengthening and postlengthening,respectively.(3) Pre-strengthening in the intensity of prosodic words at prosodic boundaries existed at all three levels in both registers,but it was probably a weak cue to discriminate the two registers.(4) Only slight pitch reset was found at Prosodic Clitic boundaries in formal speech and at Prosodic Phrase boundaries in informal speech.
文摘People spend most of their time communicating their thoughts, ideas, attitudes, and emotions on social media platforms like Twitter, however, an important mode of communication as the nonverbal component which requires visual and audible cues is not allowed due to the nature of these text-based platforms. The aim of this research is to discover the alternative ways Arabs use across different dialects to compensate for the absence of the nonverbal component. To be able to discover that the researchers collected a corpus of tweets written in the Arabic language by using python through the Twitter application programming interface (API). The results can be summed up as follows: emojis helped Arabs to communicate their facial expressions and the top used emoji across the different dialects was Face with Tears of Joy, it was also apparent that the top used emojis reflected the universal emotions, regarding the usage of hand gestures, Egyptian dialect came in the first place and Emirati dialect in the second place. Prosodic features such as the tone and loudness of the voice are expressed by the mean of character repetition, Punctuation usage across the Arabic dialects was limited, and Lebanese seemed to use them the most, Arabs tend to replace punctuation marks with emojis, finally, Arabs used vocal expressions like Interjections to communicate their affective state.
文摘The present paper explores the special behavior of geminate consonants in Moroccan Arabic vis-á-vis short consonants and consonant clusters.By way of comparison,it is shown that geminates exhibit properties that are reminiscent of both unit structures and cluster structures.In particular,we reveal that geminates in MA demonstrate an inconsistent behavior in relation to the process of schwa epenthesis.In this context,we ask whether geminates get split up in MA,and when and how that happens.In order to characterize the patterning of geminates in MA,different phonological representations of geminates are examined against the variable behavior of geminates.On this basis,it is eventually suggested that geminates should be depicted as two root nodes that are underlyingly associated with a mora at the prosodic level.
文摘Communicative dynamism is of great importance in the field of linguistics, which would help a lot to extract the useful experience for the numerous fields as the educational field. In order to make full advantage of the communicative dynamism,it is imperative to find out the details. Therefore, a summary should be made so as to get a better understanding of the discourse analysis. In the summary, the distribution of communicative dynamism and prosodic prominence would be discussed together with the related details.