期刊文献+

一种基于约束的半监督聚类查询扩展方法

A query expansion method based on constrained semi-supervised clustering
下载PDF
导出
摘要 针对伪相关反馈模型反馈文档信息质量差和扩展词选择不适产生的漂移现象等问题,提出了一种基于约束的半监督聚类查询扩展方法。该方法对初检结果的前k个文档进行人工标注,分成相关文档与不相关文档两类;并利用一种半监督聚类算法对初检结果的前n个文档进行分析,提取出与查询相关的文档作为反馈文档。该方法通过对少量标注文档与查询相关性的学习,能够较准确地估计出大量未知文档与查询的相关性,提高反馈文档的质量,从而有效提高检索的查全率和查准率。实验结果表明,该方法比传统的伪相关反馈和基于无监督聚类的伪相关反馈有更优的检索性能。 Given that the quality of feedback documents of pseudo relevance feedback model is poor and expansion terms are select- ed inappropriatdy, the new query often drifts from the original query. We propose a query expansion method based on constrain- ed semi-supervised clustering. It marks the top k documents of the initial retrieval set in advance and divides them into relevant documents and irrelevant documents; it analyzes the top n documents using a semi-supervised clustering algorithm to find relevant documents used as feedback documents. The algorithm could more accurately estimate the correlation between a large number of unknown documents and query by learning from a small amount of documents that are known to us, thus improving the quality of the feedback information. The experimental results show that the proposed method outperforms both pseudo-relevance feedback and query-likelihood language model.
出处 《中国科技论文》 CAS 北大核心 2013年第10期994-997,共4页 China Sciencepaper
基金 国家自然科学基金资助项目(61073041 61073043) 黑龙江省自然科学基金资助项目(F200901) 高等学校博士学科点专项科研基金资助项目(20112304110011 20122304110012)
关键词 信息检索 查询扩展 约束聚类 半监督聚类 伪相关反馈 information retrieval query expansion constrained clustering semi-supervised clustering pseudo-relevance feedback
  • 相关文献

参考文献9

  • 1Lin Y, Lin H F, Song J, et al. Social annotation in query expansion: a machine learning approach[C]// Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Re- trieval. New York: ACM. 2011: 405-414.
  • 2王健,李志斌,林鸿飞.一种基于社会化标注的查询扩展方法[J].郑州大学学报(工学版),2012,33(5):114-117. 被引量:2
  • 3Xu Yang, Jones Gareth J F, Wang Bin. Query depend- ent pseudo-relevance feedback based on Wikipedia [C]//Proceedings of the 32nd International ACM SI- GIR Conference on Research and Development in Infor- mation Retrieval. New York: ACM, 2009: 59-66.
  • 4张博,张斌,高克宁.一种用于查询扩展词选取的主题模型[J].东北大学学报(自然科学版),2013,34(3):348-351. 被引量:2
  • 5Lee K S, Croft W B, Allan J. A cluster-based resam-piing method for pseudo-relevance feedback [C]//Pro- ceedings of the 31st Annual International ACM SIOIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008: 235-242.
  • 6Xu J, Croft W B. Improving the effectiveness of infor- mation retrieval with local context analysis [J]. ACM Trans Inform Syst, 2000, 18(1): 79-112.
  • 7Cao G, Nie J Y, Gao J, Robertson S. Selecting good expansion terms for pseudo-relevance feedback [C]// Proceedings of the 31st Annual International ACM SI- GIR Conference on Research and Development in Infor- mation Retrieval. New York.- ACM, 2008: 243-250.
  • 8Zhai Chengxiang, Lafferty J. A study of smoothing methods for language models applied to information re- trieval [J]. ACM Trans Inform Syst, 2004, 22(2).- 179-214.
  • 9Bilenko M, Basu S, Mooney R J. Integrating con- straints and metric learning in semi-supervised cluste- ring [C]//Proceedings of the Twenty-first International Conference on Machine Learning. New York, 2004: 11.

二级参考文献18

  • 1尚书姐,王灿,朱俊彦.一种依据标签的网页摘要方法[J].计算机程,2010,36(21):260-261,264.
  • 2HOTHO A, JAESCHKE R, SCHMITZ C, et al. Infor- mation Retrieval in Folksonomies: Search and Ranking [ C ]// Proceedings of Extended Semantic Web Confer- ence, Budva, Serbia Monteneg. 2006:411 - 426.
  • 3BAO Sheng-hua, XUE Gui-rong,WU Xiao-yuan, et al. Optimizing web search using social annotations [ C ]// Proceedings of the 16th international conference on World Wide Web. Banff, Alberta, Canada. 2007:501 -510.
  • 4XU Sheng-liang, BAO Sheng-hua, CAO Yun-bo, et al. Using Social Annotations to Improve Language Model for Information Retrieval [ C ]. Lisbon, Portu- gal: The 16nd ACM CIKM Conference, 2007. 1003 - 1006.
  • 5CATTUTO C, SCHMITZ C, BALDASSARRI A, et al. Network properties of folksonomies [ J ]. AI Communi- cations, 2007, 20 (4) :245 - 262.
  • 6MEO P D, QUATTRONE G, URSINO D. Exploitation of semantic relationships and hierarchical data struc- tures to support a user in his annotation and browsing activities in folksonomies [ J ]. Information Systems, 2009, 34(6) :511 -535.
  • 7GLEN J, JENNIFER W. SimRank: A measure of structural-contextsimilarity [ C ]. Edmonton, Canada: Proceedings of SIGKDD, 2002. 538 - 543.
  • 8LU Cai-mei, HU Xiao-hua, PARK E K. Exploit the tripartite network of social tagging for web clustering [ J]. IEEE Transactions on Systems, Man and Cyber- netics, Part A (Systems and Humans) , 2011,41(5) : 840 - 852.
  • 9Lin Y,Iin H F, Song J, et al. Social annotation in queryexpansion : a machine learning approach [ C ]//Special InterestGroup on Information Retrieval.Beijing,2011:405 -414.
  • 10Cao G H,Nie J Y,Gao J F,et al. Selecting good expansionterms for pseudo-relevance feedback [ C ]//Special InterestGroup on Information Retrieval. Singapore,2008 :243 -250.

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部