信息检索中的带权邻近度度量研究被引量：1

Exploration of Weighted Proximity Measure in Information Retrieval

下载PDF

导出

摘要信息检索需要解决的主要问题是为信息索取者提供相关、准确甚至完整的信息.大量的传统检索模型基于词袋假设进行建模,不考虑查询词之间的相互联系.词项邻近度信息在现有的研究中常被用于提升经典信息检索模型的检索效果,但大部分工作没有考虑查询中各个词重要性的差异.在现代信息检索的查询请求中,查询词之间不仅不完全相互独立,而且分别具有不同的重要程度.因此,在计算邻近度信息时对查询词的重要性进行区分,将有助于提高检索效果.带权邻近度BM25模型(WP-BM25)使用待检索数据集的背景信息对查询词的重要性进行区分,并将带权邻近度度量方法整合到BM25模型中.在TREC评测的3个标准数据集FR88-89,WT2G和WT10G上的一系列对比实验表明,该模型具有较好的鲁棒性,且能够使检索效果得到显著提升. A key problem of information retrieval is to provide information takers with relevant, accurate and even complete information. Lots of traditional information retrieval models are based on the bag-of-words assumption, without considering the implied associations among the query terms. Although term proximity has been widely used for boosting the performance of the classical information retrieval models, most of those efforts do not fully consider the different importance between the query terms. For queries in modern information retrieval, the query terms are not only dependent of each other, but also different in importance. Thus, computing the term proximity with taking into account the different importance of terms will be helpful to improve the retrieval performance. In order to achieve this, a weighted term proximity measure method is introduced, which distinguishes the significance of the query terms based on the collections to be searched. Weighted proximity BM25 model（WP-BM25） that integrating this method into the Okapi BM25 model is proposed to rank the retrieved documents. A large number of experiments are conducted on three standard TREC collections which are FR88-89, WT2G and WT10G. The results show that the weighted proximity BM25 model can significantly improve the retrieval performance, and it has good robustness.

作者薛源海俞晓明刘悦关峰程学旗

机构地区中国科学院网络数据科学与技术重点实验室中国科学院计算技术研究所中国科学院大学

出处《计算机研究与发展》 EI CSCD 北大核心 2014年第10期2216-2224,共9页 Journal of Computer Research and Development

基金国家自然科学基金项目(61100083) 国家"八六三"高技术研究发展计划基金项目(2012AA011003)

关键词带权邻近度度量方法 BM25 查询词重要性信息检索 weighted proximity measure method BM25 term significance information retrieval

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Manning C D, Raghavan r , Schutze H. Introduction to Information Retrieval [M]. Cambridge: Cambridge University Press, 2008.
2Salton G, Wong A, Yang C S. A vector space model for automatic indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
3Robertson S E, Jones K S. Relevance weighting of search terms [J]. Journal of the American Society for Information Science, 1976,27(3): 129-146.
4Robertson S, Zaragoza H. The Probabilistic Relevance Framework [M]. Hanover, MA: Now Publishers Inc, 2009.
5Ponte J M. Croft W B. A language modeling approach to information retrieval [C] //Proc of the 21st Annual Int ACM SIG//< Conf on Research and Development in Information Retrieval. New York, ACM. 1998, 275-281.
6Fagan J. Aut oma t ic phrase indexing for document ret ricv?l [C] //Proc of the 10th Annual Int ACM SIGIR Conf on Rescarch and Development in Information Retrieval. Ne-w York, ACM. 1987, 91-101.
7Croft W B. Turtle H R. Lewis D D. The use of phrases and structured queries in information retrieval [C] //I'roc of the 14th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York, ACM. 1991, 32-45.
8(;,\0 J. Nil' J v, Wu (;, e t al. Dependence language model for information retrieval eC] //Proc of the 27th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York, ACM, 2004, 170-177.
9Metzler D. Croft W B. A Markov random field model for term dependencies [Cl //Proc of the 28th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. \lew York, ACM. 2005: 472-479.
10Tao T, Zhai C X. An exploration of proximity measures In information retrieval [el //Proc of the 30th Annual Int ACM SIGIR Conf on Research and Development in Infonnation Retrieval. New York, ACM. 2007, 295-302.

二级参考文献80

1王继民,彭波.搜索引擎用户点击行为分析[J].情报学报,2006,25(2):154-162. 被引量：45
2H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen and H. Li, Context aware query suggestion by mining click-through and session data[C]//Proceed ing of the 14th ACM SIGKDD. New York; ACM, 2008: 875-883.
3D. Gayo-Avello. A survey on session detection meth ods in query logs and a proposal for future evaluation [J]. Information Science: an International Journal. Elsevier Science Inc. May, 2009, 179 (12):1822- 1843.
4R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval [M]. New York: ACM, and England: Addison-Wesley, 1999.
5Yanan Li, Sen Zhang, Bin Wang, Jintao Li, Characteristics of Chinese Web Searching: A Large-Scale Analysis of Chinese Query Logs [J]. Journal of Computational Information Systems, 2008, 4(3): 1127- 1136.
6D. Gleich, L. Zhukov. SVD based term suggestion and ranking system [C]//ICDM'04. IEEE, 2004.
7H Ma, H Yang, I King, M R Lyu. Learning latent semantic relations from clickthrough data for query suggestion [C]//CIKM' 08. New York: ACM, 2008:709-718.
8Thorsten Joachims, Laura Granka, Bing Pan, Accurately Interpreting Clickthrough Data as Implicit Feedback [C]//SIGIR'05. New York: ACM, 2005.
9Shihao Ji, Ke Zhou, Ciya Liao, Zhaohui Zheng, Gui Rong Xue, O. Chapelle, Gordon Sun, Hongyuan Zha. Global ranking by exploiting user clicks[C]// SIGIR'09. New York: ACM, 2009: 35-42.
10W. Zhang, J. Yan, Sh.-Ch. Yan, N. Liu, Zh. Chen. Temporal query substitution for ad seareh[C]//SIGIR'09. New York: ACM, 2009: 798-799.

共引文献38

1李环媛,戚宇林.P2P技术在供电公司中的应用研究[J].科技资讯,2007,5(32):67-68.
2姜传菊.浅谈P2P在数字图书馆中的应用[J].现代情报,2005,25(8):100-102. 被引量：4
3左亚尧,汤庸,舒忠梅.基于P2P的实化视图维护技术研究[J].计算机科学,2006,33(7):92-94.
4窦永香.解读对等网环境下的知识检索[J].现代图书情报技术,2007(6):42-46. 被引量：2
5韩毅.P2P网络信息检索的研究进展[J].现代图书情报技术,2007(7):36-40. 被引量：4
6王菁,杨寿保,高鹰,郭磊涛.FCAN:一种基于快速映射的内容访问网络[J].系统仿真学报,2007,19(17):3955-3960. 被引量：1
7达列雄.基于P2P技术的网络信息检索的探讨[J].福建电脑,2008,24(3):42-43.
8吴炜,苏永红,李瑞轩,卢正鼎.基于DHT的分布式索引技术研究与实现[J].计算机科学,2010,37(2):65-70. 被引量：8
9霍林,黄保华,鲍洋,胡和平.用于对等全文检索的安全覆盖网[J].计算机科学,2011,38(1):104-106.
10刘治纲,叶水生.基于多本体的搜索引擎框架设计[J].南昌航空大学学报（自然科学版）,2011,25(2):54-60. 被引量：1

同被引文献14

1曾庆辉,邱玉辉.一种基于协作过滤的电子图书推荐系统[J].计算机科学,2005,32(6):147-150. 被引量：14
2Mooney R J, Roy L. Content-based book recommending using learning for text categorization [ C ]//Proceedings of the fifth ACM conference on digital libraries. San Antonio, Texas, USA : ACM ,2000 : 195-204.
3Tsuji K, Takizawa N, Sato S, et al. Book recommendation based on library loan records and bibhographic information[ J ]. Social and Behavioral Sciences ,2013,147:478-486.
4Sohail S S, Siddiqui J, Ali R. Book recommendation system u- sing opinion mining technique [ C ]//Proc of the international conference on advances in computing,communications and in- formatics. Mysore ,India: [ s. n. ] ,2013:1609-1614.
5Vaz P C, de Matos D M, Marings S, et al. Improving a hybrid literary book recommendation system through author ranking [ C]//Proceedings of the 12th ACM,/IEEE-CS joint confer- ence on digital libraries. Washington, DC, USA : IEEE, 2012 : 387 -388.
6Chen M, Jin X, Shen D. Short text classification improved by learning multi - granularity topics [ C ]//Proceedings of the twenty-second international joint conference on artificial intel- ligence. Barcelona, Catalonia, Spain : AAAI Press ,2011 : 1776- 1781.
7Banerjee S, Ramanathan K, Gupta A. Clustering short texts u- sing Wikipedia[ C ]//Proceedings of the 30th annual interna- tional ACM SIGIR conference on research and development in information retrieval. Amsterdam, Netherland : ACM,2007 : 787 -788.
8Bagirov A M, Ugon J,Webb D. Fast modified global k-means algorithm for incremental cluster construction [ J ]. Pattern Recognition ,2011,44 (4) : 866-876.
9纪良浩.协作过滤信息推荐技术研究[J].重庆邮电大学学报（自然科学版）,2012,24(1):78-82. 被引量：5
10王显飞,陈梅,李小天.基于约束的旅游推荐系统的研究与设计[J].计算机技术与发展,2012,22(2):141-145. 被引量：16

引证文献1

1何绯娟,缪相林,许大炜,毕鹏.基于“读者—图书”二部图的个性化图书推荐方法[J].计算机技术与发展,2015,25(5):25-28. 被引量：2

二级引证文献2

1李卓卓,马越,韩静娴.基于二部图匹配的通借通还馆藏优化研究——以苏州大学图书馆为例[J].图书情报工作,2017,61(19):80-88. 被引量：3
2张超杰,吴果林.基于约束推荐的网络可视化分析[J].计算机技术与发展,2019,29(10):115-119.

1张志强,王万玉.一种改进的双边滤波算法[J].中国图象图形学报,2009,14(3):443-447. 被引量：75
2胡旷达.基于神经网络的个性化信息检索模型研究[J].现代计算机（中旬刊）,2016(4):18-23. 被引量：2
3赵晖,孙进平,王文光,毛士艺.基于模糊聚类的SAR图像道路检测[J].遥测遥控,2009,30(2):34-39. 被引量：2
4王晓斌,卢显良,侯孟书,周旭.基于邻近度的P2P路由算法[J].计算机科学,2008,35(5):35-37. 被引量：2
5李德军,赵文杰,谭海峰,陈永甜.一种基于双边滤波的图像边缘检测方法[J].计算机技术与发展,2007,17(4):161-163. 被引量：11
6卫威,王建民.一种大规模数据的快速潜在语义索引[J].计算机工程,2009,35(15):35-37. 被引量：10
7刘茂福,周斌,胡慧君,陈建勋.问答系统中基于维基百科的问题扩展技术研究[J].工业控制计算机,2012,25(9):101-103. 被引量：3
8范晨熙,黄理灿,李雪利.基于Lucene的BM25模型的评分机制的研究[J].工业控制计算机,2013,26(3):78-79. 被引量：15
9龚小龙,王明文,万剑怡,王晓庆.结合邻近度的语义位置语言检索模型[J].中文信息学报,2015,29(4):183-191.
10朱杰,吴树芳,王妍,刘永利.信息检索概述[J].大众科技,2009,11(4):42-43. 被引量：1

计算机研究与发展

2014年第10期

浏览历史

内容加载中请稍等...

信息检索中的带权邻近度度量研究被引量：1

参考文献16

二级参考文献80

共引文献38

同被引文献14

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

信息检索中的带权邻近度度量研究 被引量：1

参考文献16

二级参考文献80

共引文献38

同被引文献14

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

信息检索中的带权邻近度度量研究被引量：1