摘要
传统蒙古文和西里尔蒙古文分别是在中国和蒙古国使用的蒙古文,它们的口语基本相同,但是书写形式完全不同。结合传统蒙古文和西里尔蒙古文的构词特点,提出了基于联合序列模型的传统蒙古文和西里尔蒙古文相互转换方法,并做了大量的相互转换实验。实验中,传统蒙古文到西里尔蒙古文转换系统的词误识率和字母误识率分别达到了18.38%和6.75%,西里尔蒙古文到传统蒙古文转换系统的词误识率字母误识率分别达到了18.77%和7.14%,基本达到了实用要求。
Traditional Mongolian and Cyrillic Mongolian are both Mongolian languages and are widely used in China and Mongolia respectively. With almost the same pronunciations, their written forms are totally different. According to the characteristic of the two languages, this paper proposes a joint sequence model based approach and depicts in detail the corresponding experiments performed. In the experiments, the word error rate and letter error rate for the traditional Mon-golian to Cyrillic Mongolian conversion system are 18.38% and 6.75%, and that for Cyrillic Mongolian and traditional Mongolian conversion system are 18.77%and 7.14%. Experimental results show that the proposed approach can meet the basic requirements for practical use.
出处
《计算机工程与应用》
CSCD
2014年第23期206-211,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61263037
No.71163029)
内蒙古自然科学基金(No.2014BS0604)
内蒙古大学高层次人才引进科研项目资助
关键词
传统蒙古文
西里尔蒙古文
联合序列模型
联合多元
traditional Mongolian
Cyrillic Mongolian
joint sequence models
joint multigram