摘要
针对分布式信息检索时不同信息集对最终检索结果贡献度有差异的现象,提出基于历史点击数据的集合选择方法(PCTD-CS).该方法利用点击数据估计各集合与历史查询的相关度.采用基于关键词和基于检索结果相结合的方法估计查询间的相似度.利用历史查询中的相似查询估计新查询与各集合的相关度,选择相关度最高的M个集合进行检索,给出要获取前k个文档的情况下各集合应当返回的文档数.采用召回率Rm、前n个检索结果的准确率P@n及平均准确率MAP对集合选择方法的性能进行验证.实验结果表明,采用PCTD-CS方法提高了检索结果的召回率和准确率,能够更准确地定位到包含相关文档多的集合.
An approach of collection selection based on click-through data (PCTD-CS) was proposed con- sidering that collections have different contributions to the final retrieval results. Click-through data of past queries were utilized for estimating the relevance of each collection to the query. A term-based and re- sults-based mixed approach was used to estimate the similarity between queries. Past similar queries were used to predict the relevance of collections to a specific user query. Then M collections with the highest relevance were selected for retrieving, and the number of documents each collection returned was deter- mined when top k ranked results were required. Rm, P@n and MAP were used to verify the effectiveness of the new collection selection method. Experimental results demonstrated that PCTD-CS improved the accuracy and recall of search results. PCTD-CS was better at selecting collections with more relevant documents.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2013年第1期23-28,161,共7页
Journal of Zhejiang University:Engineering Science
基金
国家"核高基"重大科技专项课题资助项目(2010ZX01042-002-003)
国家自然科学基金资助项目(60703040)
浙江省科技计划重大资助项目(2007C13019)
浙江省重大科技专项资助项目(2011C13042)
杭州市重大科技创新专项资助项目(20112311A20)
关键词
分布式信息检索
集合选择
相似查询
点击数据
distributed information retrieval
collection selection
similar query
click-through data