摘要
西里尔蒙古文与传统蒙古文分别是蒙古国与中国使用的蒙古文,西里尔蒙古文到传统蒙古文的转换工作不仅给两国同胞的交流带来更多的便利,而且对蒙古族的科学、文化和教育发展具有重要意义。本文结合规则与统计模型的优点,研究了西里尔蒙古文到传统蒙古文的转换方法。本文首先采用基于规则的方法对西里尔蒙古文集内词进行转换,其次对集外词的转换采用了基于联合序列模型的方法,并采用N-gram语言模型解决了一个西里尔蒙古文单词对应多个传统蒙古文单词的问题。实验结果表明,该系统单词转换错误率低至4.12%,基本达到了实用要求。
Cyrillic Mongolian and Traditional Mongolian are used in Mongolia and China, respectively. Cyrillic Mongolian to Traditional Mongolian conversion not only will bring more convenience to exchanges between the two countries, but also has great significance for scientific, cultural and educational development of Mongolian. This paper proposes a highly efficient Cyrillic Mongolian to Traditional Mongolian conversion method. It adopts the rule based approach to convert the words in the vocabulary, and the statistical model to convert the out of-vocabulary words. A large part of Cyrillic Mongolian words correspond more than one candidates in Traditional Mongolian, which is solved by the N-gram language model. Experimental results show that the word error rate is as low as 4. 12%, meeting the practical requirement.
出处
《中文信息学报》
CSCD
北大核心
2017年第3期156-162,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61563040)
内蒙古自然科学基金(2016D06)
内蒙古大学高层次人才引进科研项目资助
关键词
西里尔蒙古文
传统蒙古文
转换
规则
联合序列模型
Cyrillic Mongolian
Traditional Mongolian
conversion
rules
joint sequence model