摘要
本文介绍了汉语语音合成语料库TH-CoSS的建设和分析。本语料库包括男女声朗读语句约2万个。语料库分为四个部分:TTS系统建库用语句、TTS系统测试用语句、特殊语调语句和特殊音节组。语料设计考虑了语料的平衡和音段、韵律信息的丰富。语料库中除了文本、语音数据外,还带有音段切分标志,标注文件采用XML格式。为了方便语音分析与开发,特研制了标注软件。本文还给出了语境特征对语音韵律影响的分析结果。
This paper states our work which focuses on the building and analysis of corpus for Mandarin Text-to- Speech System, named TH-CoSS. The text script consists of four parts: sentences for TTS system building, sentences for TTS system evaluation, special syllable groups, and sentences with special sentence type to convey special intonation. The finished corpus has about 20K sentences read by one female and one male. The annotation files are in XML format, including segmental and prosodic tags. Software tools are developed as well. On the basis of the syllables in TH-CoSS, an analysis of the influences of context features on the prosody of speech is carried out.
出处
《中文信息学报》
CSCD
北大核心
2007年第2期94-99,共6页
Journal of Chinese Information Processing
基金
国家863计划
国家自然科学基金资助项目(60418012
60433030)
关键词
计算机应用
中文信息处理
语音合成
汉语
语料库
computer application
Chinese information processing
speech synthesis
Chinese
corpus