摘要
查询扩展是解决查询词与相关文档中的词不匹配而导致检索效率低下问题的关键技术之一。提出了基于层次依赖的Markov网络信息检索扩展模型。该模型综合考虑了候选词与查询词的层次距离、词间相关性、词节点的出度和路径等因素,通过层次依赖关系对候选词进行重新加权,选择与查询最为相关的候选词应用于信息检索扩展模型,有利于挖掘出更多潜在的、深层次依赖关系的查询候选词。在5个标准数据集上进行了实验,结果表明基于层次依赖的Markov网络信息检索扩展模型与未进行查询扩展的BM25模型相比,在3-avg和11-avg上分别提高了5%-41%和5%-70%不等,与基于直接相关的Markov网络信息检索扩展模型相比,该模型在总体检索效率上表现更优。
Query expansion is one of key technologies to solve the low efficiency problem which is caused by the term mismatch between user query and relevant documents. This paper proposes a Markov network information retrieval expanded model based on hierarchical dependence. This model considers these factors comprehensively such as hierarchy distance between candidates and query terms, relevance between terms, the out degree of a term and path selection. This model also helps to mine more potential candidates by term reweighting with hierarchical depen-dence and to select candidates with more relevant to query for information retrieval expanded model. The experi-mental results on five standard collections demonstrate that the Markov network information retrieval expanded model based on hierarchical dependence outperforms BM25 model without query expansion by 5%-41%and 5%-70%in 3-avg and 11-avg respectively. Compared with the Markov network information retrieval expanded model based on direct correlation, the proposed model performs better overall on retrieval efficiency.
出处
《计算机科学与探索》
CSCD
2014年第12期1485-1493,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金
江西省高等学校科技落地计划(产学研合作)
江西省自然科学基金
江西省高校人文社会科学研究基金~~
关键词
层次依赖
MARKOV网络
查询扩展
信息检索
hierarchical dependence
Markov network
query expansion
information retrieval