期刊文献+

基于LDA主题模型的分布式信息检索集合选择方法 被引量:22

A LDA Topic Model Based Collection Selection Method for Distributed Information Retrieval
下载PDF
导出
摘要 该文针对分布式信息检索时不同集合对最终检索结果贡献度有差异的现象,提出一种基于LDA主题模型的集合选择方法。该方法首先使用基于查询的采样方法获取各集合描述信息;其次,通过建立LDA主题模型计算查询与文档的主题相关度;再次,用基于关键词相关度与主题相关度相结合的方法估计查询与样本集中文档的综合相关度,进而估计查询与各集合的相关度;最后,选择相关度最高的M个集合进行检索。实验部分采用Rm、P@n和MAP作为评价指标,对集合选择方法的性能进行了验证。实验结果表明该方法能更准确的定位到包含相关文档多的集合,提高了检索结果的召回率和准确率。 Considering that different collections have different contributions to the final search results, a LDA topic model based collection selection method is proposed for distributed information retrieval. Firstly, the method acquires information about the representation of each collection by query-based sampling. Secondly, a method using the LDA topic model is proposed to estimate the relevance between the query and a document. Thirdly, a method based on both term and topic is proposed to estimate the relevance between the query and the sample documents, by which the relevance between the query and collections can be estimated. Finally, M collections with the highest relevance are selected for retrieving. Experiment results demonstrates that the proposed method can improve the accura cy and recall of search results.
出处 《中文信息学报》 CSCD 北大核心 2017年第3期125-133,共9页 Journal of Chinese Information Processing
基金 "核高基"国家科技重大专项(2010ZX01042-002-003) 国家自然科学基金(60703040 61332017) 浙江省重大科技专项(2011C13042 2013C01046) 中国工程科技知识中心(CKCEST-2014-1-5)
关键词 集合选择 分布式信息检索 LDA collection selection distributed information retrieval LDA
  • 相关文献

参考文献3

二级参考文献32

  • 1Sahon G, Wong A, Yang C. A vector space model for automatic indexing [J]//Communications of the ACM, 1975, 18(11): 613-620.
  • 2Hinneburg A, Aggarwal C, Keim D. What Is the Nearest Neighbor in High Dimensional Spaces [C]// Proceeding of the 26th VLDB Conference, 2000: 506-515.
  • 3Dumais S, Furnas G. , Landauer T, Scott D, et al. Using I.atent Semantic Analysis to Improve Access to Textual Information [C]//Proceedings of Computer Human Interaction, 1988: 281-285.
  • 4Hofmann T. Probabilistic Latent Semantic Indexing [C]//Proeeedings of the 22th Annual International SIGIR Conference on Research and Development in Information Retrieval, 1999:50-57.
  • 5Blei D, Ng A, Jordan M. Latcnt Dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3(5) : 993-1022.
  • 6Phan X, Nguyen L, Horiguchi S. Learning to classify short and sparse text & web with hidden topics from large scale data collections [C]//Proceedings of 2008 WWW Conference, 2008: 91-100.
  • 7Titov I, McDonald. Modeling online reviews with multi-grain topic models [C]//Proceedings of 2008 WWW Conference, 2008: 111-120.
  • 8谭松波,王月粉.中文文本分类语料库.Tan CorpV1.0.www.Searchforum.org.cn/tansongbo/corpus.htm.
  • 9J.Ponte and W.B.Croft, A Language Modeling Approach to Information Retrieval[A]. In: Proceedings of the 1998 ACM SIGIR Conference on Research and Development in Infommfion Retrieval[C]. 1998, 275-281.
  • 10A. Berger and J.I.afferty. InfonmlJon retrieval as statistical translation[A]. In: Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval[ C]. 1999,222- 229.

共引文献22

同被引文献206

引证文献22

二级引证文献117

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部