利用主题标引进行查询重排序

Re-rank Retrieval Results Through Subject Indexing

导出

摘要【目的】在伪相关反馈过程中,利用主题标引对查询结果进行重排序。【方法】借助语言模型方法,挖掘主题词与用户查询关系,将用户查询表达为主题词的概率分布,并建立主题词语言模型,进而判断主题词在文档中的权重。在此基础上,重新计算初次查询结果文档分值,进行查询重排序。【结果】本文方法能够较好地为主题词建立语言模型表示,挖掘得到主题词在文档中的权重,重排序结果相较于初次检索具有普遍性能提升。【局限】未比较挖掘主题词与文档关系的不同方法;未在不同规模、不同语言数据集中实验。【结论】挖掘主题词与用户查询关系、主题词与文档关系,进行查询重排序,能够提升查询精确度。 [Objective] This paper tries to re-rank search results with the help of subject indexing in the process of pseudo feedback. [Methods] User queries are represented with probability distributions over subject terms by mining the user query and subject term association in the manner of language modeling. The weights of subject terms in documents are calculated by incorporating the generative language models for subject terms. Then re-calculate the score of search documents in the first retrieval and re-rank the documents according to their scores. [Results] The proposed method constructs the generative langauge models for subject terms and mines weights of subject terms in documents appropriately. The re-rank results are pervasively improved over the initial retieval. [Limitations] Different methods of mining the associations between subject terms and documents are not compared. This approach doesn＇t test the datasets with different scales or in different languages. [Conclusions] The re-rank approach can improve the retrieval precision, which exploits the associations between user queries, documents and subject terms.

作者毛进李纲操玉杰

机构地区武汉大学信息资源研究中心网易(杭州)网络有限公司

出处《现代图书情报技术》 CSSCI 北大核心 2014年第7期48-55,共8页 New Technology of Library and Information Service

基金国家社会科学基金重大项目"智慧城市应急决策情报体系建设研究"(项目编号:13&ZD173)的研究成果之一

关键词语言模型信息检索主题词主题标引查询重排序 Language model Information retrieval Subject heading Subject indexing Re-rank results

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献29

1Furnas G W, Landauer T K, Gomez L M, et al. The Vocabulary Problem in Human-system Communication[J]. Communications of the ACM, 1987, 30(11): 964-971.
2PubMed [EB/OL]. [2013-12-09]. http://www.ncbi.nlm.nih.gov/ pubmed/.
3Lu Z Y, Kim W, Wilbur W J. Evaluation of Query Expansion Using MeSH in PubMed [J]. Information Retrieval, 2009, 12(1): 69-80.
4Shin K, Hart S Y. Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights [C]. In: Proceedings of the 9th International Conference on Applications of Natural Languages to Information Systems, NLDB 2004, Salford, UK. Berlin: Springer, 2004: 388-394.
5Jalali V, Borujerdi M R M. Information Retrieval with Concept-based Pseudo-relevance Feedback in MEDLINE [J]. Knowledge and Information Systems, 2011, 29(1): 237-248.
6Meij E, De Rijke M. Integrating Conceptual Knowledge into Relevance Models: A Model and Estimation Method [C]. In: Proceedings of International Conference on the Theory of Information Retrieval (ICTIR 2007). 2007.
7Meij E, Trieschnigg D, De Rijke M, et al. Conceptual Language Models for Domain-specific Retrieval [J]. Infor- mation Processing and Management, 2010, 46(4): 448-469.
8Croft W B. What do People Want from Information Retrieval [J]. D-Lib Magazine, 1995, 1(5). http://www.dlib.org/dlib/ november95/11 croft.html.
9Krestel R, Fankhauser P. Reranking Web Search Results for Diversity[J]. Information Retrieval, 2012, 15(5): 458-477.
10Santos R L, Macdonald C, Ounis I. On the Role of Novelty for Search Result Diversification [J]. Information Retrieval, 2012, 15(5): 478-502.

二级参考文献19

1杨广翔,俞宁,谌莉.搜索引擎结果的重排序方法[J].计算机应用,2005,25(2):305-308. 被引量：13
2徐金雷,杨晓江.专业搜索引擎的排序算法研究[J].现代图书情报技术,2006(7):20-24. 被引量：9
3Wray Buntine, Jaakko Lofstrom, Sami Perttu, et al. Topic - specific Scoring of Documents for Relevant Retrieval [ C ]. In: Proceedings of ICML 2005 Workshop 4 : Learning in Web Search, Bonn, Germany. 2005.
4张俊林.Google怎么做(3.搜索结果重排序)[EB/OL].[2009-09-10]http://blogcsdn.net/malefactor/archive/2006/05/19/745966.aspx.
5Jinxi,X.,W.B.Croft.Improving the effectiveness of information retrieval with local context analysis[J].ACM Trans.Inf.Syst.,2000,18(1):79-112.
6Gerard,S..Automatic text processing:the transfor-mation,analysis,and retrieval of information by computer[M].Addison-Wesley Longman Publishing Co.,Inc.1989:78-99.
7Kamps,J..Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary[C]//Proceedings of the 21th European Conference on Information Retrieval,2004:283-295.
8Qu,Y.L.,G.W.Xu,et al..Rerank Method Based on Individual Thesaurus[C]//Proceedings of NTCIR2 Workshop,2000:79-112.
9Bodo,B.,Z.Justin.Questioning query expansion:an examination of behaviour and parameters[C]//Proceedings of the 15 th Australasian database conferenceVolume 27.Dunedin,New Zealand,Australian Computer Society Inc.,2004:69-76.
10Carolyn,J.C.,B.C.Donald.Improving the retrieval effectiveness of very short queries[J].Inf.Process.Manage.2002,38(1):1-36.

共引文献10

1吴思竹,张智雄,李峰.科研机构动态监测的网络资源重要性排序方法研究[J].情报理论与实践,2011,34(3):49-53. 被引量：1
2邵必林,边根庆,张维琪,闫瑾.采用k-均值聚类算法的资源搜索模型研究[J].西安交通大学学报,2012,46(10):55-59. 被引量：6
3李晓鹏,颜端武,蔡金霞,成晓.利用本体资源标注实现站内检索语境导航[J].现代图书情报技术,2013(3):8-13. 被引量：9
4连雁平.基于Web2.0用户个性化推荐系统分析[J].佛山科学技术学院学报（自然科学版）,2014,32(1):53-56.
5吴正洋,陈先哲.基于协同创新机制的学术社交网络平台研究[J].中国管理信息化,2014,17(11):89-91. 被引量：2
6黄莺.基于信息资源不同一性的检索结果优化排序[J].情报科学,2014,32(9):77-80. 被引量：5
7弓盼,王嘉梅,杨小伟.基于LDA的中国少数民族网络舆情监测系统的探究[J].科技创新导报,2014,11(30):185-186.
8王晓春,李生,杨沐昀,赵铁军.一种长短期兴趣结合的个性化检索模型[J].中文信息学报,2016,30(3):172-177. 被引量：4
9丛杨,孙伟,李成名,路文娟.基于用户历史瓦片浏览记录的兴趣点智能搜索方法优化研究[J].地理信息世界,2019,26(2):92-95.
10杨荣芳.基于改进相关反馈技术的WEB检索方法[J].电子测试,2013,0(4S):76-77.

1马小霞,郑会云.基于领域本体的文本信息检索探讨[J].软件导刊,2012,11(4):110-111. 被引量：2
2韩桂凤.数据库主题标引实践[J].宁夏科技情报,1990(1):11-13.
3顾益军,樊孝忠,于江德,李良富.受限领域中文文本主题标引系统研究[J].计算机应用,2004,24(1):6-7. 被引量：1
4庄隽芳.浅谈智能主题标引中的知识库建设[J].福建图书馆理论与实践,2003(4):40-41.
5赵文武,刘雪飞,吴伯桥.基于数据挖掘的入侵特征选择与构造的新方法[J].计算机应用研究,2005,22(4):128-130. 被引量：3
6袁生奎.数据库建设的质量控制[J].青海科技,1998,5(3):35-36. 被引量：1
7夏冬星.LabVIEW与C语言数据存储格式转换[J].工业控制计算机,2004,17(5):38-38.
8封世云.关于XML语言和Java技术的结合研究[J].才智,2013,0(11):269-269. 被引量：2
9周和玉.科技文献的多维结构及其表达[J].大学图书情报学刊,1996,14(1):13-14.
10吕刚,郑诚.改进的基于概念相似度的文本检索[J].计算机工程,2010,36(12):55-57. 被引量：12

现代图书情报技术

2014年第7期

浏览历史

内容加载中请稍等...

利用主题标引进行查询重排序

参考文献29

二级参考文献19

共引文献10

相关作者

相关机构

相关主题

浏览历史