期刊文献+

藏语口语语音语料库的设计与研究 被引量:8

Design and research of Tibetan spoken speech corpus
下载PDF
导出
摘要 基于对普通语音语料库构建方法的研究与分析,结合自然口语语音识别研究相关需求以及藏语自然口语语音的基本特点,研究设计了适用于藏语语音识别的口语语音语料库建设方案以及相应的标注规范,并据此构建了时长50小时,包含音素、半音节、音节、藏文字以及语句共5层标注信息的藏语拉萨话口语语音语料库。统计结果显示,该语料库在保留口语语音自然属性的同时,对音素、半音节等常用语音建模单元也有均衡的覆盖,为基于藏语口语语音数据的语音识别技术研究提供了可靠的数据支撑。 Based on the research and analysis of the construction method of traditional phonological corpus, combined with the related needs of natural spoken speech recognition and the characteristics of Tibetan natural spoken language, the construction scheme and annotation standard of spoken language corpus suitable for Tibetan speech recognition is designed.A 50-hour Tibetan Lhasa spoken corpus with five layers of annotation including phonemes, semitone, syllables, Tibetan word and sentences is also constructed. The statistic characteristics show that this corpus retains the natural properties of spoken language, and also has a balanced coverage of commonly used modeling units such as phonemes, semitone, so it is able to provide reliable data support for speech recognition technology based on Tibetan spoken speech data.
作者 黄晓辉 李京 马睿 HUANG Xiaohui;LI Jing;MA Rui(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China;Department of Engineering,PLA University of Foreign Language,Luoyang,Henan 471003,China;Institute of Tibetology,Minzu University of China,Beijing 100081,China)
出处 《计算机工程与应用》 CSCD 北大核心 2018年第13期231-235,共5页 Computer Engineering and Applications
基金 国家重点研发计划项目(No.2016YFB0201402)
关键词 语音语料库 口语语音 语音识别 标注规范 藏语拉萨话 speech corpus spoken speech speech recognition annotation standard Tibetan Lhasa words
  • 相关文献

参考文献10

二级参考文献120

共引文献78

同被引文献125

引证文献8

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部