期刊文献+

信息检索中的带权邻近度度量研究 被引量:1

Exploration of Weighted Proximity Measure in Information Retrieval
下载PDF
导出
摘要 信息检索需要解决的主要问题是为信息索取者提供相关、准确甚至完整的信息.大量的传统检索模型基于词袋假设进行建模,不考虑查询词之间的相互联系.词项邻近度信息在现有的研究中常被用于提升经典信息检索模型的检索效果,但大部分工作没有考虑查询中各个词重要性的差异.在现代信息检索的查询请求中,查询词之间不仅不完全相互独立,而且分别具有不同的重要程度.因此,在计算邻近度信息时对查询词的重要性进行区分,将有助于提高检索效果.带权邻近度BM25模型(WP-BM25)使用待检索数据集的背景信息对查询词的重要性进行区分,并将带权邻近度度量方法整合到BM25模型中.在TREC评测的3个标准数据集FR88-89,WT2G和WT10G上的一系列对比实验表明,该模型具有较好的鲁棒性,且能够使检索效果得到显著提升. A key problem of information retrieval is to provide information takers with relevant, accurate and even complete information. Lots of traditional information retrieval models are based on the bag-of-words assumption, without considering the implied associations among the query terms. Although term proximity has been widely used for boosting the performance of the classical information retrieval models, most of those efforts do not fully consider the different importance between the query terms. For queries in modern information retrieval, the query terms are not only dependent of each other, but also different in importance. Thus, computing the term proximity with taking into account the different importance of terms will be helpful to improve the retrieval performance. In order to achieve this, a weighted term proximity measure method is introduced, which distinguishes the significance of the query terms based on the collections to be searched. Weighted proximity BM25 model(WP-BM25) that integrating this method into the Okapi BM25 model is proposed to rank the retrieved documents. A large number of experiments are conducted on three standard TREC collections which are FR88-89, WT2G and WT10G. The results show that the weighted proximity BM25 model can significantly improve the retrieval performance, and it has good robustness.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第10期2216-2224,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61100083) 国家"八六三"高技术研究发展计划基金项目(2012AA011003)
关键词 带权邻近度 度量方法 BM25 查询词重要性 信息检索 weighted proximity measure method BM25 term significance information retrieval
  • 相关文献

参考文献16

  • 1Manning C D, Raghavan r , Schutze H. Introduction to Information Retrieval [M]. Cambridge: Cambridge University Press, 2008.
  • 2Salton G, Wong A, Yang C S. A vector space model for automatic indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
  • 3Robertson S E, Jones K S. Relevance weighting of search terms [J]. Journal of the American Society for Information Science, 1976,27(3): 129-146.
  • 4Robertson S, Zaragoza H. The Probabilistic Relevance Framework [M]. Hanover, MA: Now Publishers Inc, 2009.
  • 5Ponte J M. Croft W B. A language modeling approach to information retrieval [C] //Proc of the 21st Annual Int ACM SIG//< Conf on Research and Development in Information Retrieval. New York, ACM. 1998, 275-281.
  • 6Fagan J. Aut oma t ic phrase indexing for document ret ricv?l [C] //Proc of the 10th Annual Int ACM SIGIR Conf on Rescarch and Development in Information Retrieval. Ne-w York, ACM. 1987, 91-101.
  • 7Croft W B. Turtle H R. Lewis D D. The use of phrases and structured queries in information retrieval [C] //I'roc of the 14th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York, ACM. 1991, 32-45.
  • 8(;,\0 J. Nil' J v, Wu (;, e t al. Dependence language model for information retrieval eC] //Proc of the 27th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York, ACM, 2004, 170-177.
  • 9Metzler D. Croft W B. A Markov random field model for term dependencies [Cl //Proc of the 28th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. \lew York, ACM. 2005: 472-479.
  • 10Tao T, Zhai C X. An exploration of proximity measures In information retrieval [el //Proc of the 30th Annual Int ACM SIGIR Conf on Research and Development in Infonnation Retrieval. New York, ACM. 2007, 295-302.

二级参考文献80

  • 1王继民,彭波.搜索引擎用户点击行为分析[J].情报学报,2006,25(2):154-162. 被引量:45
  • 2H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen and H. Li, Context aware query suggestion by mining click-through and session data[C]//Proceed ing of the 14th ACM SIGKDD. New York; ACM, 2008: 875-883.
  • 3D. Gayo-Avello. A survey on session detection meth ods in query logs and a proposal for future evaluation [J]. Information Science: an International Journal. Elsevier Science Inc. May, 2009, 179 (12):1822- 1843.
  • 4R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval [M]. New York: ACM, and England: Addison-Wesley, 1999.
  • 5Yanan Li, Sen Zhang, Bin Wang, Jintao Li, Characteristics of Chinese Web Searching: A Large-Scale Analysis of Chinese Query Logs [J]. Journal of Computational Information Systems, 2008, 4(3): 1127- 1136.
  • 6D. Gleich, L. Zhukov. SVD based term suggestion and ranking system [C]//ICDM'04. IEEE, 2004.
  • 7H Ma, H Yang, I King, M R Lyu. Learning latent semantic relations from clickthrough data for query suggestion [C]//CIKM' 08. New York: ACM, 2008:709-718.
  • 8Thorsten Joachims, Laura Granka, Bing Pan, Accurately Interpreting Clickthrough Data as Implicit Feedback [C]//SIGIR'05. New York: ACM, 2005.
  • 9Shihao Ji, Ke Zhou, Ciya Liao, Zhaohui Zheng, Gui Rong Xue, O. Chapelle, Gordon Sun, Hongyuan Zha. Global ranking by exploiting user clicks[C]// SIGIR'09. New York: ACM, 2009: 35-42.
  • 10W. Zhang, J. Yan, Sh.-Ch. Yan, N. Liu, Zh. Chen. Temporal query substitution for ad seareh[C]//SIGIR'09. New York: ACM, 2009: 798-799.

共引文献38

同被引文献14

  • 1曾庆辉,邱玉辉.一种基于协作过滤的电子图书推荐系统[J].计算机科学,2005,32(6):147-150. 被引量:14
  • 2Mooney R J, Roy L. Content-based book recommending using learning for text categorization [ C ]//Proceedings of the fifth ACM conference on digital libraries. San Antonio, Texas, USA : ACM ,2000 : 195-204.
  • 3Tsuji K, Takizawa N, Sato S, et al. Book recommendation based on library loan records and bibhographic information[ J ]. Social and Behavioral Sciences ,2013,147:478-486.
  • 4Sohail S S, Siddiqui J, Ali R. Book recommendation system u- sing opinion mining technique [ C ]//Proc of the international conference on advances in computing,communications and in- formatics. Mysore ,India: [ s. n. ] ,2013:1609-1614.
  • 5Vaz P C, de Matos D M, Marings S, et al. Improving a hybrid literary book recommendation system through author ranking [ C]//Proceedings of the 12th ACM,/IEEE-CS joint confer- ence on digital libraries. Washington, DC, USA : IEEE, 2012 : 387 -388.
  • 6Chen M, Jin X, Shen D. Short text classification improved by learning multi - granularity topics [ C ]//Proceedings of the twenty-second international joint conference on artificial intel- ligence. Barcelona, Catalonia, Spain : AAAI Press ,2011 : 1776- 1781.
  • 7Banerjee S, Ramanathan K, Gupta A. Clustering short texts u- sing Wikipedia[ C ]//Proceedings of the 30th annual interna- tional ACM SIGIR conference on research and development in information retrieval. Amsterdam, Netherland : ACM,2007 : 787 -788.
  • 8Bagirov A M, Ugon J,Webb D. Fast modified global k-means algorithm for incremental cluster construction [ J ]. Pattern Recognition ,2011,44 (4) : 866-876.
  • 9纪良浩.协作过滤信息推荐技术研究[J].重庆邮电大学学报(自然科学版),2012,24(1):78-82. 被引量:5
  • 10王显飞,陈梅,李小天.基于约束的旅游推荐系统的研究与设计[J].计算机技术与发展,2012,22(2):141-145. 被引量:16

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部