Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitud...Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.展开更多
A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-rela...A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use contextindependent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multipronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.展开更多
This paper introduces several important features of the Chinese large vocabulary continuous speech recognition system in the NICT/ATR multi-lingual speech-to-speech translation system. The features include: (1) a f...This paper introduces several important features of the Chinese large vocabulary continuous speech recognition system in the NICT/ATR multi-lingual speech-to-speech translation system. The features include: (1) a flexible way to derive an information rich phoneme set based on mutual information between a text corpus and its phoneme set; (2) a hidden Markov network acoustic model and a successive state splitting algorithm to generate its model topology based on a minimum description length criterion; and (3) advanced language modeling using multi-class composite N-grams. These features allow a recognition performance of 90% character accuracy in tourism related dialogue with a real time response speed.展开更多
The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several r...The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several related key factors that may affect the overall word decoding effect are carefully studied in this paper, including the perfecting of the vocabulary, the big-discount Turing re-estimating of the N-Gram probabilities, and the managing of the searching path buffers. Based on these discussions, corresponding approaches to improving the SSNS algorithm are proposed. Compared with the previous version of SSNS algorithm, the new version decreases the Chinese character error rate (CCER) in the word decoding by 42.1% across a database consisting of a large number of testing sentences (syllable strings).展开更多
This paper proposes a simplified novel speech recognition model, the state feedback neural network activation model (SFNNAM), which is developed based on the characteristics of Chinese speech structure. The model as...This paper proposes a simplified novel speech recognition model, the state feedback neural network activation model (SFNNAM), which is developed based on the characteristics of Chinese speech structure. The model assumes that the current state of speech is only a correction of the last previous state. According to the “C V”(Consonant Vowel) structure of the Chinese language, a speech segmentation method is also implemented in the SFNNAM model. This model has a definite physical meaning grounded on the structure of the Chinese language and is easily implemented in very large scale integrated circuit (VLSI). In the speech recognition experiment, less calculations were need than in the hidden Markov models (HMM) based algorithm. The recognition rate for Chinese numbers was 93.5% for the first candidate and 99.5% for the first two candidates.展开更多
After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM as...After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM assumptions. In STM, the acoustic observations of an acoustic unit are represented as clusters of trajectories in a parameter space.The trajectories are modelled by mixture of probability density functions of random sequence of states. After analyzing the characteristics of Chinese speech, the acoustic units for continuous Chinese speech recognition based on STM are discussed and phone-like units are suggested. The performance of continuous Chinese speech recognition based on STM is studied on VINICS system. The experimental results prove the efficiency of STM and the consistency of phone-like units.展开更多
As a sort of cognitive means and thinking mode,conceptual metaphor is widely applied to political discourses.Statesmen often publicize their political thoughts by using conceptual metaphors in their political discours...As a sort of cognitive means and thinking mode,conceptual metaphor is widely applied to political discourses.Statesmen often publicize their political thoughts by using conceptual metaphors in their political discourses so that the audience can understand their political ideas easily.Based on the Conceptual Metaphor Theory,this paper aims to analyze the conceptual metaphors in Xi Jinping's 2016 New Year address so as to summarize the types,functions and significance of conceptual metaphors in Chinese political discourses,in the hope of helping readers interpret political speeches better.展开更多
Through carefully studying the theory of speech acts and the literature concerning it,the author made some new findings which reflects in three aspects:the similarities and differences in Chinese and English in expres...Through carefully studying the theory of speech acts and the literature concerning it,the author made some new findings which reflects in three aspects:the similarities and differences in Chinese and English in expressing the same speech act,the relations between different types of speech acts and the correspondence between sentence sets and sets of speech acts.展开更多
This study investigated whether adults who stutter and normal adult speakers differ in the production of stop consonants in fluent reading Chinese Putonghua speech.Voice onset time(VOT) was measured and the spectral...This study investigated whether adults who stutter and normal adult speakers differ in the production of stop consonants in fluent reading Chinese Putonghua speech.Voice onset time(VOT) was measured and the spectral moments at the stop burst were calculated for the stutterers(both before and after the speech therapy) and also for the nonstutterers. The statistical results showed that there were no significant differences in VOT between the nonstutterers and stutterers either prior to or after therapy,although the mean VOT of the stutterers was slightly greater than that of the nonstutterers.The results also indicated that both the obstruction place and the subsequent syllabic final exhibited an influence to a greater extent on VOT for the stutterers.In the spectral domain,the spectral mean of the stuttering participants before therapy was significantly different from that of the normal participants, whereas the group difference became insignificant after the therapy session.The smaller spectral mean for the stutterers might be interpreted as a more posterior occlusion in the oral cavity when producing alveolars and velars.In addition,productions of the stutterers scattered with a wider range in the space of spectral moments.Furthermore,the smaller main effect of syllabic finals on the mean spectral frequency of the burst suggested that the stutterers exhibited weaker anticipatory coarticulation than the nonstutterers.展开更多
Experimental evidence showed that declination exists in most sentences in Chinese Putonghua. However, the specific pitch variation of prosodic words (PWs) within sentences has not been fully understood. The dialogue...Experimental evidence showed that declination exists in most sentences in Chinese Putonghua. However, the specific pitch variation of prosodic words (PWs) within sentences has not been fully understood. The dialogue material used in this study was taken from the 973 telephone conversation corpus, which includes 69 dialogues, with 79 speakers involved. The read speech material was recordings of news announcement by two announcers from radio, totally 221 sentences. Top and bottom points of pitch contour and the range of prosodic words with the sentences were studied. It was found that, for both dialogue and read speech, pitch declination exists for most sentences, with minor exceptions in the first part of longer sentences for dialogue. Compared to read speech, the pitch range of prosodic words for dialogue is smaller. For dialogue, the pitch range of the last prosodic words of the sentences are relatively larger, while there is no significant difference for pitch ranges of prosodic words in most sentences of read speech. This study will be helpful for the modeling of the pitch range and register of prosodic words in sentences in speech synthesis.展开更多
National assessment of speech synthesis systems for Chinese has been regularly carried out since 1994 in China. New guidelines to the assessment activities which aim at promoting the assessment work to be standardizab...National assessment of speech synthesis systems for Chinese has been regularly carried out since 1994 in China. New guidelines to the assessment activities which aim at promoting the assessment work to be standardizable, automatizable (partially) and accessible to the public by computer network were set up in 1997. Two modules. the phonetic module and the linguistic module, are evaluated individually. The phonetic module is evaluated by using speech intelligibility tests at three levels:syllable, word and sentence, and speech natu-ralness tests (in MOS). As for the linguistic module, the text processing ability, which includes word segmentation, polyphonic characters, numerals, years, symbols and metrological units, is examined automatically.展开更多
The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained...The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios.展开更多
Nonlinear dynamic method is used in studying Chinese spoken in normal speed, and the improved correlation dimension algorithm are made for the characterization of speech signal. The reconstructed phase space and corre...Nonlinear dynamic method is used in studying Chinese spoken in normal speed, and the improved correlation dimension algorithm are made for the characterization of speech signal. The reconstructed phase space and correlation dimension curves of unvoiced fricative consonants and vowels are also given. It is found that the correlation dimension algorithm can distinguish fricative from vowel because of the different mechanism between them. And the study shows that it can provide information for distinguishing four basic tones in mandarin.展开更多
A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using ...A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using speech intelligibility tests. 16 college students (8 male, 8 female) with no experience with synthetic speech were the listeners, they were asked to do open response task by pencilpaper. In addition, speech naturalness was mea-sured by Mean Opinion展开更多
Well developed continuous speech recognition and synthesis systems demand a high quality continuous speech database which is compact and valid, and whose scientific design would benefit from incorporating linguistic a...Well developed continuous speech recognition and synthesis systems demand a high quality continuous speech database which is compact and valid, and whose scientific design would benefit from incorporating linguistic and phonetic knowledge. It is argued that at the present stage the database should be limited to read speech. To describe those very complex variabilities in continuous speech, the following speech units are proposed: (1) 401syllables without tone; (2) 415 inter-syllabic diphones, (3) 3035 inter-syllabic triphones, (4) 781 inter-syllabic final-initial structures. The 17 basic sefltence patterns in standard Chinese are summarized to cover the most important prosodic phenomena. By using the automatic method,2393 sentences and 388 phrases are selected by above phonetic rules from a large corpus, which includes People's Daily in recent years, TV play scripts and dictionary entries, as the reading text of continuous speech recognition database in standard Chinese. This set of sentences and pbrases covers 99.8% syllables without counting tones, 100% inter-syllable diphones, 99.6% inter-syllable triphones and 100% sentence patterns.展开更多
The 6th National General Congress of Chinese Association of Integrative Medicine (CALM) was convened at 19-20, April 2008 in Beijing. Academician CHEN Zhu, the minister of Ministry of Health indicated at the congres...The 6th National General Congress of Chinese Association of Integrative Medicine (CALM) was convened at 19-20, April 2008 in Beijing. Academician CHEN Zhu, the minister of Ministry of Health indicated at the congress that the integration of Chinese and Western medicine is very well in keeping with the situation of our country and the general rule of development in medical science; and as a good integration of Chinese medicine and Western medicine, it is mutually beneficial and advantageous to both of them. Seeing the creativity shown in integrative medical investigation in theoretic and methodological sides, we should and must persist in and develop it.展开更多
文摘Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.
基金This paper is based upon a study supported by the US National Science Foundation under Grant No.0121285. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
文摘A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use contextindependent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multipronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.
文摘This paper introduces several important features of the Chinese large vocabulary continuous speech recognition system in the NICT/ATR multi-lingual speech-to-speech translation system. The features include: (1) a flexible way to derive an information rich phoneme set based on mutual information between a text corpus and its phoneme set; (2) a hidden Markov network acoustic model and a successive state splitting algorithm to generate its model topology based on a minimum description length criterion; and (3) advanced language modeling using multi-class composite N-grams. These features allow a recognition performance of 90% character accuracy in tourism related dialogue with a real time response speed.
文摘The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several related key factors that may affect the overall word decoding effect are carefully studied in this paper, including the perfecting of the vocabulary, the big-discount Turing re-estimating of the N-Gram probabilities, and the managing of the searching path buffers. Based on these discussions, corresponding approaches to improving the SSNS algorithm are proposed. Compared with the previous version of SSNS algorithm, the new version decreases the Chinese character error rate (CCER) in the word decoding by 42.1% across a database consisting of a large number of testing sentences (syllable strings).
文摘This paper proposes a simplified novel speech recognition model, the state feedback neural network activation model (SFNNAM), which is developed based on the characteristics of Chinese speech structure. The model assumes that the current state of speech is only a correction of the last previous state. According to the “C V”(Consonant Vowel) structure of the Chinese language, a speech segmentation method is also implemented in the SFNNAM model. This model has a definite physical meaning grounded on the structure of the Chinese language and is easily implemented in very large scale integrated circuit (VLSI). In the speech recognition experiment, less calculations were need than in the hidden Markov models (HMM) based algorithm. The recognition rate for Chinese numbers was 93.5% for the first candidate and 99.5% for the first two candidates.
文摘After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM assumptions. In STM, the acoustic observations of an acoustic unit are represented as clusters of trajectories in a parameter space.The trajectories are modelled by mixture of probability density functions of random sequence of states. After analyzing the characteristics of Chinese speech, the acoustic units for continuous Chinese speech recognition based on STM are discussed and phone-like units are suggested. The performance of continuous Chinese speech recognition based on STM is studied on VINICS system. The experimental results prove the efficiency of STM and the consistency of phone-like units.
文摘As a sort of cognitive means and thinking mode,conceptual metaphor is widely applied to political discourses.Statesmen often publicize their political thoughts by using conceptual metaphors in their political discourses so that the audience can understand their political ideas easily.Based on the Conceptual Metaphor Theory,this paper aims to analyze the conceptual metaphors in Xi Jinping's 2016 New Year address so as to summarize the types,functions and significance of conceptual metaphors in Chinese political discourses,in the hope of helping readers interpret political speeches better.
文摘Through carefully studying the theory of speech acts and the literature concerning it,the author made some new findings which reflects in three aspects:the similarities and differences in Chinese and English in expressing the same speech act,the relations between different types of speech acts and the correspondence between sentence sets and sets of speech acts.
基金supported by the National Natural Science Foundation of China(10874203,10925419, 90920302,61072124,11074275,11161140319,91120001,61271426)Strategic Priority Research Program of the Chinese Academy of Sciences(XDA06030100,XDA06030500)+1 种基金National 863 Program(2012AA012503)CAS Priority Deployment Project(KGZD-EW-103-2)
文摘This study investigated whether adults who stutter and normal adult speakers differ in the production of stop consonants in fluent reading Chinese Putonghua speech.Voice onset time(VOT) was measured and the spectral moments at the stop burst were calculated for the stutterers(both before and after the speech therapy) and also for the nonstutterers. The statistical results showed that there were no significant differences in VOT between the nonstutterers and stutterers either prior to or after therapy,although the mean VOT of the stutterers was slightly greater than that of the nonstutterers.The results also indicated that both the obstruction place and the subsequent syllabic final exhibited an influence to a greater extent on VOT for the stutterers.In the spectral domain,the spectral mean of the stuttering participants before therapy was significantly different from that of the normal participants, whereas the group difference became insignificant after the therapy session.The smaller spectral mean for the stutterers might be interpreted as a more posterior occlusion in the oral cavity when producing alveolars and velars.In addition,productions of the stutterers scattered with a wider range in the space of spectral moments.Furthermore,the smaller main effect of syllabic finals on the mean spectral frequency of the burst suggested that the stutterers exhibited weaker anticipatory coarticulation than the nonstutterers.
文摘Experimental evidence showed that declination exists in most sentences in Chinese Putonghua. However, the specific pitch variation of prosodic words (PWs) within sentences has not been fully understood. The dialogue material used in this study was taken from the 973 telephone conversation corpus, which includes 69 dialogues, with 79 speakers involved. The read speech material was recordings of news announcement by two announcers from radio, totally 221 sentences. Top and bottom points of pitch contour and the range of prosodic words with the sentences were studied. It was found that, for both dialogue and read speech, pitch declination exists for most sentences, with minor exceptions in the first part of longer sentences for dialogue. Compared to read speech, the pitch range of prosodic words for dialogue is smaller. For dialogue, the pitch range of the last prosodic words of the sentences are relatively larger, while there is no significant difference for pitch ranges of prosodic words in most sentences of read speech. This study will be helpful for the modeling of the pitch range and register of prosodic words in sentences in speech synthesis.
文摘National assessment of speech synthesis systems for Chinese has been regularly carried out since 1994 in China. New guidelines to the assessment activities which aim at promoting the assessment work to be standardizable, automatizable (partially) and accessible to the public by computer network were set up in 1997. Two modules. the phonetic module and the linguistic module, are evaluated individually. The phonetic module is evaluated by using speech intelligibility tests at three levels:syllable, word and sentence, and speech natu-ralness tests (in MOS). As for the linguistic module, the text processing ability, which includes word segmentation, polyphonic characters, numerals, years, symbols and metrological units, is examined automatically.
文摘The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios.
基金National Natural Science Foundation of China!(No. 19834040).
文摘Nonlinear dynamic method is used in studying Chinese spoken in normal speed, and the improved correlation dimension algorithm are made for the characterization of speech signal. The reconstructed phase space and correlation dimension curves of unvoiced fricative consonants and vowels are also given. It is found that the correlation dimension algorithm can distinguish fricative from vowel because of the different mechanism between them. And the study shows that it can provide information for distinguishing four basic tones in mandarin.
文摘A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using speech intelligibility tests. 16 college students (8 male, 8 female) with no experience with synthetic speech were the listeners, they were asked to do open response task by pencilpaper. In addition, speech naturalness was mea-sured by Mean Opinion
文摘Well developed continuous speech recognition and synthesis systems demand a high quality continuous speech database which is compact and valid, and whose scientific design would benefit from incorporating linguistic and phonetic knowledge. It is argued that at the present stage the database should be limited to read speech. To describe those very complex variabilities in continuous speech, the following speech units are proposed: (1) 401syllables without tone; (2) 415 inter-syllabic diphones, (3) 3035 inter-syllabic triphones, (4) 781 inter-syllabic final-initial structures. The 17 basic sefltence patterns in standard Chinese are summarized to cover the most important prosodic phenomena. By using the automatic method,2393 sentences and 388 phrases are selected by above phonetic rules from a large corpus, which includes People's Daily in recent years, TV play scripts and dictionary entries, as the reading text of continuous speech recognition database in standard Chinese. This set of sentences and pbrases covers 99.8% syllables without counting tones, 100% inter-syllable diphones, 99.6% inter-syllable triphones and 100% sentence patterns.
文摘The 6th National General Congress of Chinese Association of Integrative Medicine (CALM) was convened at 19-20, April 2008 in Beijing. Academician CHEN Zhu, the minister of Ministry of Health indicated at the congress that the integration of Chinese and Western medicine is very well in keeping with the situation of our country and the general rule of development in medical science; and as a good integration of Chinese medicine and Western medicine, it is mutually beneficial and advantageous to both of them. Seeing the creativity shown in integrative medical investigation in theoretic and methodological sides, we should and must persist in and develop it.