期刊文献+

中文语音合成系统中的一种两层韵律结构生成体系(英文) 被引量:2

A Two-stage Prosodic Structure Generation Strategy for Mandarin Text-to-speech Systems
下载PDF
导出
摘要 Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase,which are two fundamental layers in the hierarchical prosodic structure of Mandarin,and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with diflerent feature selections. Besides,a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%. Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase,which are two fundamental layers in the hierarchical prosodic structure of Mandarin,and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with diflerent feature selections. Besides,a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.
出处 《自动化学报》 CSCD 北大核心 2010年第11期1569-1574,共6页 Acta Automatica Sinica
基金 Supported by National Natural Science Foundation of China(90920001) the Key Project of the Ministry of Education of China(108012) Joint-research Project between France Telecom R&DBeijing and Beijing University of Posts and Telecommunications(SEV01100474)
关键词 中文语音合成系统 两层韵律结构生成体系 计算机技术 自动化系统 Text-to-speech (TTS) prosodic structure generation conditional random fields (CRF) transformation-based errordriven learning (TBL)
  • 相关文献

参考文献2

二级参考文献6

  • 1王洪君.汉语的韵律词与韵律短语[J].中国语文,2000(6):525-536. 被引量:101
  • 2Niu Zhengyu, Chai Peiqi. Segmentation of Prosodic Phrase for Improving the Naturalness of Synthesized Chinese Speech. In The Proceedings of ICSLP'2000, III. 350-353.
  • 3Jianfen Cao & Wdbin Zhu. Syntactic and Lexical Constraint in Prosodic Segmentation and Grouping. In The Proceedings. of Speech Prosody2002.
  • 4Zheng, B., Wang, B., Yang, Y., Lu, S. & Cao, J.. The regular accent in Chinese sentences. In The Proceedings of ICSLP'2000, I, 86-89.
  • 5曹剑芬.普通话节奏的声学语音学特性[A].吕士楠等主编.现代语音学论文集[C].北京:金城出版社,1999年.155—159.
  • 6应宏,蔡莲红.基于结构助词驱动的韵律短语界定的研究[J].中文信息学报,1999,13(6):41-46. 被引量:18

共引文献44

同被引文献19

  • 1李剑锋,胡国平,王仁华.基于最大熵模型的韵律短语边界预测[J].中文信息学报,2004,18(5):56-63. 被引量:20
  • 2杨军.ToBI韵律标注体系及其运用[J].现代外语,2005,28(4):360-366. 被引量:14
  • 3荀恩东,钱揖丽,郭庆,宋柔.应用二叉树剪枝识别韵律短语边界[J].中文信息学报,2006,20(3):1-5. 被引量:4
  • 4SILVERMAN K E A, BECKMAN M E, PITRELLI J F,et al. ToBI:a standard for labeling english prosody [ C ]//Proc of International Con- ference on Spoken Language Processing. 1992:867-870.
  • 5LI Wei-jun,YANG Yu-fang. Perception of prosodic hierarchical bound- aries in Mandarin Chinese sentences [ J ]. Neuroseience, 2009, 158 (4) :1416-1425.
  • 6YING Zhi-wei, SHI Xiao-hua. An RNN-based algorithm to detect pro- sodic phrase for Chinese TIS[ C]//Proc of International Conference on Acoustics, Speech, and Signal Processing. 2001 : 809- 812.
  • 7BAILLY G, HOLM B. SFC : a trainable prosodic model [ J ]. Speech Communication ,2005,46 (3-4) :348-364.
  • 8FUJIO S, SAGISAKA Y, HIGUCHI N. Prediction of prosodic phrase boundaries using stochastic context-free grammar[ C]//Proc of the 3rd International Conference on Spoken Language Processing. 1994:18-22.
  • 9READ I, COX S. Stochastic and syntactic techniques for predicting phrase breaks [ J ]. Computer Speech & Language, 2007,21 ( 3 ) : 519-542.
  • 10SANDERS E, TAYLOR P. Using statistical models to predict phrase boundaries for speech synthesis [ C ]//Proc of European Conference on Speech Communication and Technology. 1995:1811-1814.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部