The key to improve the naturalness of the synthetic speech is to build a perfect prosodic model. Thus the preliminary study of the influence of sentence stress on prosodic features in Standard Chinese is presented, in...The key to improve the naturalness of the synthetic speech is to build a perfect prosodic model. Thus the preliminary study of the influence of sentence stress on prosodic features in Standard Chinese is presented, in which the relationship between duration and pitch under different sentence stress levels is revealed. The results show that: (a) The pitch is the most important factor which carries the sentence stress, with the increase of the sentence stress level, the distribution of pitch moves towards high frequency area. (b) In the case of 'Stress' and 'Light', two peaks are found in the distribution of duration, which means that the duration of some words are affected by sentence stress, but of others are not. (c) The relationship between pitch and duration in 'Normal', 'Stress' and 'Light' are un-correlated, positive correlation and negative correlation respectively. The research lays a foundation for building a perfect prosodic model for standard Chinese speech synthesis system. The results of research are also used to classify sentence stress using a neural network, and 63% identification ratio has been obtained in an open set test. An automatic method for the sentence stress labeling is also suggested.展开更多
文摘The key to improve the naturalness of the synthetic speech is to build a perfect prosodic model. Thus the preliminary study of the influence of sentence stress on prosodic features in Standard Chinese is presented, in which the relationship between duration and pitch under different sentence stress levels is revealed. The results show that: (a) The pitch is the most important factor which carries the sentence stress, with the increase of the sentence stress level, the distribution of pitch moves towards high frequency area. (b) In the case of 'Stress' and 'Light', two peaks are found in the distribution of duration, which means that the duration of some words are affected by sentence stress, but of others are not. (c) The relationship between pitch and duration in 'Normal', 'Stress' and 'Light' are un-correlated, positive correlation and negative correlation respectively. The research lays a foundation for building a perfect prosodic model for standard Chinese speech synthesis system. The results of research are also used to classify sentence stress using a neural network, and 63% identification ratio has been obtained in an open set test. An automatic method for the sentence stress labeling is also suggested.