摘要
有效避免伪反馈的“查询主题漂移”主要需要解决两大问题,一是如何确定相关文档,形成较高质量的伪相关文档集,另一个是在伪相关文档集里如何挑选扩展信息。本文主要研究在获取了高质量伪相关文档集合的基础上如何有效进行XML查询扩展。针对XML文档的特点,提出了扩展向量空间模型的查询词扩展方法。实验结果表明,与初始查询和传统的词项扩展方法相比,该扩展方法更能获得与用户查询意图相关的扩展信息,更能有效地提高检索质量和性能。
A serious issue in Pseudo-Relevance Feedback is "topic drift" and one needs to solve two major problems to avoid the drift. In this paper, we study how to perform XML query expansion effectively under the premise of obtaining good pseudo-relevance document resources . Aiming at the characteristics of XML document, a keyword expansion method based on extended vector space model is proposed. The experimental results show that the proposed method outperforms original query and traditional TF * IDF method in precision and produce better retrieval performance.
出处
《情报学报》
CSSCI
北大核心
2013年第6期610-617,共8页
Journal of the China Society for Scientific and Technical Information
基金
国家社会科学基金项目(12CTQ042)
国家自然科学基金项目(61173146,61262035)
关键词
伪反馈
XML查询扩展
标签语义权重
节点层次
Pseudo-Relevance Feedback, XML query expansion, tag semantic weight, node level