摘要
【目的】在伪相关反馈过程中,利用主题标引对查询结果进行重排序。【方法】借助语言模型方法,挖掘主题词与用户查询关系,将用户查询表达为主题词的概率分布,并建立主题词语言模型,进而判断主题词在文档中的权重。在此基础上,重新计算初次查询结果文档分值,进行查询重排序。【结果】本文方法能够较好地为主题词建立语言模型表示,挖掘得到主题词在文档中的权重,重排序结果相较于初次检索具有普遍性能提升。【局限】未比较挖掘主题词与文档关系的不同方法;未在不同规模、不同语言数据集中实验。【结论】挖掘主题词与用户查询关系、主题词与文档关系,进行查询重排序,能够提升查询精确度。
[Objective] This paper tries to re-rank search results with the help of subject indexing in the process of pseudo feedback. [Methods] User queries are represented with probability distributions over subject terms by mining the user query and subject term association in the manner of language modeling. The weights of subject terms in documents are calculated by incorporating the generative language models for subject terms. Then re-calculate the score of search documents in the first retrieval and re-rank the documents according to their scores. [Results] The proposed method constructs the generative langauge models for subject terms and mines weights of subject terms in documents appropriately. The re-rank results are pervasively improved over the initial retieval. [Limitations] Different methods of mining the associations between subject terms and documents are not compared. This approach doesn't test the datasets with different scales or in different languages. [Conclusions] The re-rank approach can improve the retrieval precision, which exploits the associations between user queries, documents and subject terms.
出处
《现代图书情报技术》
CSSCI
北大核心
2014年第7期48-55,共8页
New Technology of Library and Information Service
基金
国家社会科学基金重大项目"智慧城市应急决策情报体系建设研究"(项目编号:13&ZD173)的研究成果之一
关键词
语言模型
信息检索
主题词
主题标引
查询重排序
Language model Information retrieval Subject heading Subject indexing Re-rank results