期刊文献+

缅甸语分词方法及其实现 被引量:1

Burmese Segmentation Methods and Its Implementation
下载PDF
导出
摘要 缅甸语与英语以及其它西方语言不同,它的词之间没有明显的边界,开发缅甸语的语音合成系统时,分词是其中的一个重要环节。我们从大约600 M的原始语料库中选取5000个完整句子,由缅语专家人工分词以后作为该文的实验数据集。本文对比了基于条件随机场(CRF)的缅语分词方法与基于正向最大匹配算法(FMM)的缅语分词方法,并用置信度、分词精度和分词速度评估分词性能。在本次实验中,基于CRF与FMM的缅语分词结果中置信度分别可达94.1%和84.3%,F-值分别可达93.8%和82.9%。表明,应用CRF方法实现缅语分词的效果更好,且该方法可满足开发缅语语音合成系统的要求。 Unlike English and other western languages, there are no delimiters to mark word boundaries in Burmese. Therefore, word segmentation is an important part in the realization of Burmese speech synthesis. Through manually word segmentation by Burmese experts, we have constructed a Burmese text database containing 5000 sentences as experimental data of this paper. The CRF-based word segmentation method is compared with the FMM-based word segmentation method. The performance of word segmentation method was evaluated with confidence, precision and speed of segmentation. In this experiment, the confidence of the Burmese word segmentation the CRF-based and FMM-based was 94.1% and 84.3%, respectively, and the F values were 93.8% and 82.9%, respectively. It shows that the CRF method can be applied to Burmese word segmentation with better effect. We believe that this method meets the requirements for the development of the Burmese speech synthesis system.
作者 马昌娥 杨鉴
出处 《计算机科学与应用》 2018年第11期1682-1688,共7页 Computer Science and Application
基金 国家自然科学基金项目(61262068)资助.
  • 相关文献

同被引文献6

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部