

The Research to Cover Based Chinese Information Retrieval Technology
摘要 信息检索系统如果能较精确地定位于文章中用户关心的部分必将提高用户的检索效率。基于Cover级别的检索策略就是针对上述问题提出的。基于Cover级别的检索策略以用户查询的关键词集合作为输入,在被检索文档中找到包含关键词集合的最短文本片断集作为输出。文章采用了一种经过改进的基于Cover级别的检索策略,对系统返回的文本片断作了限制,并在检索过程中使用了贪心算法(Greedy Algorithm)的思想,最后将其应用到中文信息检索系统中。实验证明,采用改进的策略比原有的基于Cover级别的检索策略在返回有效结果个数和平均排序倒数(MRR)等指标上都有了提高。 Abstract: If an information retrieval system can get what the customers exactly want from documents,it will improve the efficiency of the customers" retrieval.The cover based technology is proposed to solve the above problem.The cover based retrieval takes the key words of customer's query as its input,and finds the shortest text snippets set from target documents as its resuhs.This paper uses an improved cover based retrieval technology,restricts the text snippet that the system returns and uses Greedy Algorithm during the process of retrieval,and finally applies this method to practical Chinese information retrieval system.The experiment have proved that the technology which is improved is better than the common cover based technology, on valid results statistics and mean reciprocal rank(MRR).
出处 《计算机工程与应用》 CSCD 北大核心 2005年第25期165-167,196,共4页 Computer Engineering and Applications
基金 国家自然科学基金重点基金项目:问答式信息检索的理论与方法(编号:60435020)支持
关键词 基于Cover的检索 贪心算法 最短文本片断 cover based technology,greedy algorithm,shortest text snippets set
  • 相关文献


  • 1Ed Greengrass.Information Retrieval:A Survey[R].DOD Technical Report TR-R52-008-O01,2001.
  • 2Brian Goetz.The Lucene search engine : Powerful, flexible, and free. http ://www.javaworld.com/javaworld/jw-09- 2000/jw-0915-1ucene. html, 2000-09.
  • 3Keizo Oyama,Akira Miyazawa et al.Development of a Full-Text Information Retrieval System.SDAIR-94,1994.
  • 4Charles L A Clarke,Gordon V.Cormack, Forbes J Burkowski.Shortest Substring Ranking(MuhiText Experiments for TREC-4).Gaithersburg, Maryland, 1995-11 : 295-304.
  • 5Charles L A Clarke,Gordon V Cormack,Elizabeth A Tudhope.Relevance ranking for one to three term queries[J].Information Processing and Management,2000; (36) :291-311.
  • 6Sergey Brin,Lawrence Page.The Anatomy of a Large-Scale Hypertextual Web Search Engine[J].Computer Networks and ISDN Systems, 1998 ; (30) : 107-117.
  • 7Christopher D Manning,Hinrich Schtitze.Foundations of Statistical Natural Language Processing[M].The MIT Press, 1999-06.
  • 8Ricardo Baeza- Yates, Berthier Ribeiro- Neto. Modem Information Retrieval[M].Addison Wesley Longman Publishing Co Inc,1999.
  • 9Ellen M Voorhees.Overview of the TREC 2002 Question Answering Track[C].In :Proceedings of the Text Retrieval Conference,2001.








使用帮助 返回顶部