基于Cover级别的中文信息检索技术的研究

The Research to Cover Based Chinese Information Retrieval Technology

下载PDF

导出

摘要信息检索系统如果能较精确地定位于文章中用户关心的部分必将提高用户的检索效率。基于Cover级别的检索策略就是针对上述问题提出的。基于Cover级别的检索策略以用户查询的关键词集合作为输入,在被检索文档中找到包含关键词集合的最短文本片断集作为输出。文章采用了一种经过改进的基于Cover级别的检索策略,对系统返回的文本片断作了限制,并在检索过程中使用了贪心算法(Greedy Algorithm)的思想,最后将其应用到中文信息检索系统中。实验证明,采用改进的策略比原有的基于Cover级别的检索策略在返回有效结果个数和平均排序倒数(MRR)等指标上都有了提高。 Abstract： If an information retrieval system can get what the customers exactly want from documents,it will improve the efficiency of the customers＂ retrieval.The cover based technology is proposed to solve the above problem.The cover based retrieval takes the key words of customer＇s query as its input,and finds the shortest text snippets set from target documents as its resuhs.This paper uses an improved cover based retrieval technology,restricts the text snippet that the system returns and uses Greedy Algorithm during the process of retrieval,and finally applies this method to practical Chinese information retrieval system.The experiment have proved that the technology which is improved is better than the common cover based technology, on valid results statistics and mean reciprocal rank（MRR）.

作者包刚关毅王强赵健

机构地区哈尔滨工业大学计算机科学与工程学院

出处《计算机工程与应用》 CSCD 北大核心 2005年第25期165-167,196,共4页 Computer Engineering and Applications

基金国家自然科学基金重点基金项目:问答式信息检索的理论与方法(编号:60435020)支持

关键词基于Cover的检索贪心算法最短文本片断 cover based technology,greedy algorithm,shortest text snippets set

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Ed Greengrass.Information Retrieval:A Survey[R].DOD Technical Report TR-R52-008-O01,2001.
2Brian Goetz.The Lucene search engine : Powerful, flexible, and free. http ://www.javaworld.com/javaworld/jw-09- 2000/jw-0915-1ucene. html, 2000-09.
3Keizo Oyama,Akira Miyazawa et al.Development of a Full-Text Information Retrieval System.SDAIR-94,1994.
4Charles L A Clarke,Gordon V.Cormack, Forbes J Burkowski.Shortest Substring Ranking(MuhiText Experiments for TREC-4).Gaithersburg, Maryland, 1995-11 : 295-304.
5Charles L A Clarke,Gordon V Cormack,Elizabeth A Tudhope.Relevance ranking for one to three term queries[J].Information Processing and Management,2000; (36) :291-311.
6Sergey Brin,Lawrence Page.The Anatomy of a Large-Scale Hypertextual Web Search Engine[J].Computer Networks and ISDN Systems, 1998 ; (30) : 107-117.
7Christopher D Manning,Hinrich Schtitze.Foundations of Statistical Natural Language Processing[M].The MIT Press, 1999-06.
8Ricardo Baeza- Yates, Berthier Ribeiro- Neto. Modem Information Retrieval[M].Addison Wesley Longman Publishing Co Inc,1999.
9Ellen M Voorhees.Overview of the TREC 2002 Question Answering Track[C].In :Proceedings of the Text Retrieval Conference,2001.

1王家宁.电子邮件相关安全技术的探索与研究[J].中国科技纵横,2013(13):39-39.
2易鑫,李雷.基于数学形态学的snake模型图像分割[J].兵工自动化,2007,26(9):43-44. 被引量：3
3二马.真正Server级的P4主板升技TH7—RAID主板[J].ComputerDIY．电脑DIY,2001(5):22-22.
4刘建芳,王刘涛,马飞.海量数据环境下高效数据定位算法研究与仿真[J].计算机仿真,2016,33(3):376-379. 被引量：2
5文龙.廉颇老矣尚能饭否[J].汽车导购,2015,0(3):132-135.
6陈国定,俞立,李章维.利用Fourier级数对时滞系统的参数估计[J].浙江工业大学学报,1997,25(2):160-166.
7钱剑飞,陈华,陈奇,俞瑞钊.一种代码与中文文档关联信息的自动提取方法[J].浙江大学学报（工学版）,2004,38(11):1417-1421. 被引量：2
82006年《微型计算机》3·15售后服务调查[J].微型计算机,2006,26(9):116-116.
9唐小琦,周济,蔡李隆.机器人操作臂的解耦跟踪控制[J].华中理工大学学报,1998,26(12):10-12. 被引量：1
10梁刚,赵伟,丁文珂.基于内容的个性化网站设计[J].开封教育学院学报,2006,26(4):49-51.

计算机工程与应用

2005年第25期

浏览历史

内容加载中请稍等...

基于Cover级别的中文信息检索技术的研究

参考文献9

相关作者

相关机构

相关主题

浏览历史