期刊文献+

基于历史点击数据的集合选择方法 被引量:2

Approach for collection selection based on click-through data
下载PDF
导出
摘要 针对分布式信息检索时不同信息集对最终检索结果贡献度有差异的现象,提出基于历史点击数据的集合选择方法(PCTD-CS).该方法利用点击数据估计各集合与历史查询的相关度.采用基于关键词和基于检索结果相结合的方法估计查询间的相似度.利用历史查询中的相似查询估计新查询与各集合的相关度,选择相关度最高的M个集合进行检索,给出要获取前k个文档的情况下各集合应当返回的文档数.采用召回率Rm、前n个检索结果的准确率P@n及平均准确率MAP对集合选择方法的性能进行验证.实验结果表明,采用PCTD-CS方法提高了检索结果的召回率和准确率,能够更准确地定位到包含相关文档多的集合. An approach of collection selection based on click-through data (PCTD-CS) was proposed con- sidering that collections have different contributions to the final retrieval results. Click-through data of past queries were utilized for estimating the relevance of each collection to the query. A term-based and re- sults-based mixed approach was used to estimate the similarity between queries. Past similar queries were used to predict the relevance of collections to a specific user query. Then M collections with the highest relevance were selected for retrieving, and the number of documents each collection returned was deter- mined when top k ranked results were required. Rm, P@n and MAP were used to verify the effectiveness of the new collection selection method. Experimental results demonstrated that PCTD-CS improved the accuracy and recall of search results. PCTD-CS was better at selecting collections with more relevant documents.
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2013年第1期23-28,161,共7页 Journal of Zhejiang University:Engineering Science
基金 国家"核高基"重大科技专项课题资助项目(2010ZX01042-002-003) 国家自然科学基金资助项目(60703040) 浙江省科技计划重大资助项目(2007C13019) 浙江省重大科技专项资助项目(2011C13042) 杭州市重大科技创新专项资助项目(20112311A20)
关键词 分布式信息检索 集合选择 相似查询 点击数据 distributed information retrieval collection selection similar query click-through data
  • 相关文献

参考文献14

  • 1CALLAN J. Distributed information retrieval [M]. USA: Kluwer Academic Publishes, 2000: 127- 150.
  • 2CALLAN J, LU Z, CROFT W B. Searching distributed collection with inferenee networks [C] // Proceeding of ACM SIGIR. Seattle, Washington: ACM, 1995: 21 - 28.
  • 3SI L, JIN R, ALLAN J. et al. A language modeling framework for resource selection and results merging [C] // Proceeding of ACM CIKM. McLean, Virginia: ACM, 2002: 391-397.
  • 4SI L, CALLAN J. Relevant document distribution esti- mation method for resource selection [C] // Proceeding of ACM SIGIR. Toronto, Canada: ACM, 2003: 298 - 305.
  • 5RASOLOFO Y, ABBACI F, SAVOY J. Approaches to collection selection and results merging for distributed information retrieval [C]// Proceeding of ACM CIKM. Atlanta: ACM, 2001: 191- 198.
  • 6PUPPIN D, SILVESTRI F, LAFORENZA D. Query driven document partitioning and collection selection [C] // Proceeding of the 1st INFOSCALE Conference. Hong Kong: ACM, 2006: Article 34.
  • 7ARGUELLO J, CALLAN J, DIAZ F. Classification- based resource selection [C] //Proceeding of the 18th ACM CIKM. Hong Kong: ACM, 2009:1277 - 1286.
  • 8HONG D, SI L, BRACKE P, et al. A joint probabilis- tic classification model for resource selection[C] // Proceeding of ACM SIGIR. Geneva: ACM, 2010: 98- 105.
  • 9CETINTAS S, SI L, YUAN H. Learning from past queries for resource selection [C] // Proceeding of ACM CIKM. Hong Kong: ACM, 2009:1867 - 1870.
  • 10SI L, CALLAN J. A semi-supervised learning method to merge search engine results [J]. ACM Transactions on Information Systems, 2004, 21(4) : 457 - 491.

同被引文献7

引证文献2

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部