摘要
随着网络信息的日益丰富和用户需求的提高 ,人们已经不能满足于仅仅在同一语种中进行检索 ,跨语言的信息检索 (CLIR)因而受到人们越来越多的关注。为此 ,本文提出了一种新的基于语义的跨语言信息检索模型Onto CLIR ,该模型在传统信息检索技术的基础上 ,利用本体来刻画不同语言中对应的领域知识 ,以解决从查询语言到检索语言之间转换过程中出现的语义损失和曲解等问题 ,从而保证在检索过程中能够有效地遵循用户的查询意图 ,获得预期的检索信息。本文以体育新闻检索为背景 ,以英文查询作为查询请求 ,检索来自新浪网的体育类新闻 ,结果表明采用基于本体的跨语言信息检索方法之后检索的查全率和查准率平均提高 10个百分点左右 ,有效地改善了检索性能。
With the enrichment of network information and the improvement of the user's needs, people are not satisfied with retrieving in the same kind of language. So Cross Language Information Retrieval (CLIR) receives people's more and more concerns. One of kernel problem of CLIR is how to overcome communication obstacles between different languages. This paper proposes a novel semantic based CLIR model Onto CLIR. The model, basing on the technologies of traditional information retrieval, uses Ontology to describe the relevant domain knowledge in different kinds of languages. Thus the problems of semantic loss and distortion when translating between query language and retrieval language can be solved. In this way we can ensure that the model will follow user's query intention and get the expected results. We have done experiments to validate our approach. The experiments are designed to retrieve sport news in Chinese from Sina website with query in English. The experiment results demonstrate that when applying our ontology based CLIR approach the increases of the retrieval recall and precision both have reached more than 10 percent, which shows that our approach is effective in improving retrieval performance.
出处
《中文信息学报》
CSCD
北大核心
2004年第3期1-8,60,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金资助 ( 6 0 0 0 5 0 0 4 )
安微省自然科学基金资助 ( 0 10 4 2 30 2 )