摘要
为提高信息检索系统的性能,提出了一种多查询数据融合与正相关反馈相结合的检索算法.算法的核心思想是:利用基于向量表示的余弦相似度测度计算查询与文档之间的相似度,采用多查询数据融合技术将多个检索结果融合,以及从上一次检索结果中取出前M个相关文档和初始查询一起构成新的查询,将新查询提交给系统并继续下一次的检索,如此反复,直到获得满意的结果.实验结果表明,该算法相对于仅使用了多查询数据融合技术的算法和仅使用了正相关反馈技术的算法,其平均准确率分别提高42.6%和23.17%.
In order to improve the performance of information retrieval system, a retrieval algorithm combining multi-query data fusion with positive relevance feedback is presented.The essential idea of the algorithm is as follows. The cosine similarity metric based on vector space model is used to measure the similarity between the query and documents; the retrieval results are fused by using multi-query data fusion technology; the new queries in the relevance feedback process are formed by combining the original query with the top M relevant documents from the results of the previous round retrieval, and then the new queries are used for the next round retrieval. The retrieval process is repeated until achieving satisfactory results. Experimental results show that in contrast to the algorithms that only using multi-query data technology and only using positive relevance feedback technology, the proposed algorithm increases the average precision by 42.6 ~ and 23.17%, respectively.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2005年第8期820-823,共4页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(60473004)
河南省教育厅自然科学基金资助项目(200410464004)
河南科技大学科研基金资助项目(2004ZY041).
关键词
信息检索
多查询数据融合
正相关反馈
information retrieval
multi-query data fusion
positive relevance feedback