摘要
[目的/意义]基于汉蒙间跨语言检索系统发展现状,设计并实现通过汉文、传统蒙古文关键词检索西里尔蒙古文文档的系统。[方法/过程]汉蒙跨语言检索系统包括机器翻译和文档检索。在机器翻译方面,实现了基于词典的汉文到西里尔蒙古文机器翻译,并实现了基于规则和统计的传统蒙古文到西里尔蒙古文转换;在文档检索方面,基于Lucene全文索引工具包对大量的西里尔蒙古文文档建立索引,并根据向量空间模型对查询和文档的相似度进行排序,得到与查询最为匹配的文档集。[结果/结论]本系统响应速度较快,准确率较高,达到可用水平。一方面促进中国与蒙古国之间的科技、文化、教育的交流;另一方面对我国西里尔蒙古文的研究有一定的促进作用。
[ Purpose/significance] This paper designs and implements a system for retrieving Cyrillic Mongolian documents through Chinese and traditional Mongolian keywords based on the current development of Chinese and Mongolian cross-language retrieval systems. [ Method/process ] The proposed Chinese-Mongolian cross-language retrieval system includes machine translation and document retrieval. In the aspect of machine translation, two translations are implemented : a dictionary-based Chinese to Cyrillic Mongolian translation; the traditional Mongolian to Cyrillic Mongolian conversion based on rules and statistics. For document retrieval, the Lucene full-text indexing toolkit is employed to index a large amount of Cyrillic Mongolian documents. The best matched documents are obtained using the vector space model. [ Result/conclusion ] This system has high accuracy of retrieval with rapid response, and it can be applied in practical system. On the one hand, this research promotes the exchange of science, technology, culture and education between China and Mongolia. On the other hand, it promotes the study of Cyril Mongolian in China.
出处
《情报理论与实践》
CSSCI
北大核心
2017年第4期128-132,144,共6页
Information Studies:Theory & Application
基金
国家自然科学基金项目"基于领域本体的蒙古文数字资源整合机制研究"的成果
项目编号:71163029
关键词
跨语言信息检索
信息检索系统
检索方法
cross-language information retrieval
information retrieval system
retrieval method