摘要
针对信息检索中文档与查询之间的词不匹配问题,本文提出了一种基于局部共现的查询扩展方法LOCOOC。LOCOOC利用词项与所有查询词在局部文档集合中的共现程度来评估扩展词的质量,并整合了词项在语料集中的全局统计信息,使得选取的扩展词与初始查询所表征的主题或概念具有更好的相关性。实验结果表明:与未进行查询扩展时相比,采用LOCOOC方法进行扩展后,平均准确率提高40%以上;与传统的局部反馈方法以及局部上下文分析方法(LCA,Local ContextAnalysis)相比,LOCOOC不仅具有更优的检索性能,而且有着更好的鲁棒性。
Techniques for automatic query expansion have been extensively studied in information retrieval research as a solution to the word mismatch problem between queries and documents. Using the idea of Local Context Analysis, in this paper we proposed a novel expansion method, called LOCOOC, which utilized the local informa- tion in top-ranked documents and the global statistical information in the whole collection to select most appropriate expansion terms. Experimental results show that LOCOOC offers more effective and robust retrieval performances, compared with local feedback based or LCA based expansion method.
出处
《中文信息学报》
CSCD
北大核心
2006年第3期84-91,共8页
Journal of Chinese Information Processing
基金
国家973计划资助项目(2004CB318109)
关键词
计算机应用
中文信息处理
信息检索
局部共现
查询扩展
LOCOOC
computer application
Chinese information processing
information retrieval
local co-occurrence
query expansion
LOCOOC