摘要
为了更有效地提高跨语言信息检索的性能,结合本体论和统计方法的特性,提出一种混合的跨语言信息检索模型.在该语言模型的结构上,提出一种本体描述框架,构造了一个形式化的语言本体知识表示,通过典型语料学习,融合了语法、语义、句法等多元信息,建立了源语言本体知识库.在跨语言信息检索的实际应用中,利用本体表示,获得初始的检索文档集,再基于源语言本体知识库,对全部候选文档重新排序,以提高TopN排列的精确度.利用NTCIR-3Workshop中的中英文跨语言信息检索数据集对该语言模型进行了评价,相关实验结果表明,该方法取得了较满意的实验效果.
For improving the performance of cross-lingual information retrieval, a hybrid language presented based on a combination of ontology and statistical method. In the structure of the languag model is e model, an ontology description frame was given and a linguistic ontology knowledge presentation was determined. A linguistic ontology knowledge bank of source language was created, which combines with semantic, pragmatic and syntactic by learning typical corpus. In cross-lingual information retrieval, the initial document set will be obtained by ontology presentation and all documents will be re-ordered based on linguistic ontology knowledge of source language for improving the precision of Top N rank. The cross-lingual information retrieval data set in NTCIR-3 Workshop was used to evaluate the performance of the language model. The results indicate that the proposed method improves the precision of nature language processing.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2008年第1期77-80,共4页
Journal of Harbin Institute of Technology
基金
国家自然科学基金资助项目(60736044)
国家高技术研究发展计划资助项目(2006AA01Z150
2004AA11701008)
关键词
跨语言信息检索
本体
统计方法
语言模型
知识获取
cross-lingual information retrieval
ontology
statistical method
language model
knowledge acquisition