Prosody in speech synthesis systems (text-to-speech) is a determinant of tone, duration, and loudness of speech sound. Intonation is a part of prosody which determines the speech tone. In Indonesian, intonation is det...Prosody in speech synthesis systems (text-to-speech) is a determinant of tone, duration, and loudness of speech sound. Intonation is a part of prosody which determines the speech tone. In Indonesian, intonation is determined by the structure of sentences, types of sentences, and also the position of the word in a sentence. In this study, a model of speech synthesis that focuses on its intonation is proposed. The speech intonation is determined by sentence structure, intonation patterns of the example sentences, and general rules of Indonesian pronunciation. The model receives texts and intonation patterns as inputs. Based on the general principle of Indonesian pronunciation, a prosody file was made. Based on input text, sentence structure is determined and then interval among parts of a sentence (phrase) can be determined. These intervals are used to correct the duration of the initial prosody file. Furthermore, the frequencies in prosody file were corrected using intonation patterns. The final result is prosody file that can be pronounced by speech engine application. Experiment results of studies using the original voice of radio news announcer and the speech synthesis show that the peaks of?F0?are determined by general rules or intonation patterns which are dominant. Similarity test with the PESQ method shows that the result of the synthesis is 1.18 at MOS-LQO scale.展开更多
文摘Prosody in speech synthesis systems (text-to-speech) is a determinant of tone, duration, and loudness of speech sound. Intonation is a part of prosody which determines the speech tone. In Indonesian, intonation is determined by the structure of sentences, types of sentences, and also the position of the word in a sentence. In this study, a model of speech synthesis that focuses on its intonation is proposed. The speech intonation is determined by sentence structure, intonation patterns of the example sentences, and general rules of Indonesian pronunciation. The model receives texts and intonation patterns as inputs. Based on the general principle of Indonesian pronunciation, a prosody file was made. Based on input text, sentence structure is determined and then interval among parts of a sentence (phrase) can be determined. These intervals are used to correct the duration of the initial prosody file. Furthermore, the frequencies in prosody file were corrected using intonation patterns. The final result is prosody file that can be pronounced by speech engine application. Experiment results of studies using the original voice of radio news announcer and the speech synthesis show that the peaks of?F0?are determined by general rules or intonation patterns which are dominant. Similarity test with the PESQ method shows that the result of the synthesis is 1.18 at MOS-LQO scale.