摘要
信息检索系统如果能较精确地定位于文章中用户关心的部分必将提高用户的检索效率。基于Cover级别的检索策略就是针对上述问题提出的。基于Cover级别的检索策略以用户查询的关键词集合作为输入,在被检索文档中找到包含关键词集合的最短文本片断集作为输出。文章采用了一种经过改进的基于Cover级别的检索策略,对系统返回的文本片断作了限制,并在检索过程中使用了贪心算法(Greedy Algorithm)的思想,最后将其应用到中文信息检索系统中。实验证明,采用改进的策略比原有的基于Cover级别的检索策略在返回有效结果个数和平均排序倒数(MRR)等指标上都有了提高。
Abstract: If an information retrieval system can get what the customers exactly want from documents,it will improve the efficiency of the customers" retrieval.The cover based technology is proposed to solve the above problem.The cover based retrieval takes the key words of customer's query as its input,and finds the shortest text snippets set from target documents as its resuhs.This paper uses an improved cover based retrieval technology,restricts the text snippet that the system returns and uses Greedy Algorithm during the process of retrieval,and finally applies this method to practical Chinese information retrieval system.The experiment have proved that the technology which is improved is better than the common cover based technology, on valid results statistics and mean reciprocal rank(MRR).
出处
《计算机工程与应用》
CSCD
北大核心
2005年第25期165-167,196,共4页
Computer Engineering and Applications
基金
国家自然科学基金重点基金项目:问答式信息检索的理论与方法(编号:60435020)支持