摘要
针对现有信息检索系统中存在的词不匹配问题,提出一种基于特征词抽取和相关性融合的伪相关反馈查询扩展算法以及新的扩展词权重计算方法。该算法从前列n篇初检局部文档中抽取与原查询相关的特征词,根据特征词在初检文档集中出现的频度以及与原查询的相关度,将特征词确定为最终的扩展词实现查询扩展。实验结果表明,该方法有效,并能提高和改善信息检索性能。
Aiming at the term mismatch issues of existing information retrieval systems, a novel query expansion algorithm of pseudo relevance feedback is proposed based on feature terms extraction and correlation fusion. At the same time, a new computing method for weights of expansion terms is also given. The algorithm can extract feature terms related to original quer7 from the n chapter top - ranked retrieved local documents, and then identify those feature terms as final expansion terms according to the fiequency of each feature term appeared in the local documents and the correlation between each feature term and the entire original query for query expansion. The results of the experiment show that the method is effective, and it can enhance and improve the performance of information retrieval.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第1期52-56,共5页
New Technology of Library and Information Service
基金
广西教育厅科研项目"基于加权负关联规则挖掘的文本信息检索技术研究"(项目编号:201010LX679)
广西教育学院2010年度院级重点课题"基于正负关联规则的信息检索技术研究"(项目编号:桂教院科研[2010]7号(重点)-3)的研究成果之一
关键词
相关性
伪相关反馈
查询扩展
信息检索
Correlation Pseudo relevance feedback Query expansion Information retrieval