期刊文献+

一种基于名词短语的检索结果多层聚类方法 被引量:3

A multi-level clustering approach based on noun phrases for search results
原文传递
导出
摘要 对检索结果聚类可以方便用户快速浏览搜索引擎返回结果。为了提取主题表达能力和可读性强的类别标签,获取高质量的聚类结果,提出基于名词短语的检索结果多层聚类方法:提取名词短语作为候选类别标签,根据候选类别标签分布情况生成基础类,再使用具有线性时间复杂度的一趟聚类算法对基础类进行多层聚类。与基于命名实体的方法、STC和Lingo算法的对比实验表明:提出方法在类别标签的可读性、有效性以及聚类性能上都优于以上3种方法。 Clustering search results can facilitate users to browsing the results quickly.In order to select much informative,readable cluster labels and get high qualitative clustering results,a multi-level clustering approach based on noun phrases(MCNP) was proposed for search results.Firstly,select noun phrases as candidate cluster labels and generates basic clusters based on the distribution of candidate cluster labels.Secondly,proceed multi-level clustering on basic clusters using a one pass clustering algorithm with linear time complexity.Finally,comparative study was carried out with name entities based method,STC and Lingo search results clustering algorithms,and the results demonstrated that our approach could get much more informative,readable cluster labels and was more effective than the above three methods.
出处 《山东大学学报(理学版)》 CAS CSCD 北大核心 2010年第7期39-44,49,共7页 Journal of Shandong University(Natural Science)
基金 国家自然科学基金资助项目(60673191) 广东省高等学校自然科学研究重点项目(06Z012) 广东省自然科学基金资助项目(9151026005000002)
关键词 信息检索 检索结果聚类 文本聚类 多层聚类 information retrieval search results clustering text clustering multi-level clustering
  • 相关文献

参考文献18

  • 1CARPINETO C, OSINSKI S, ROMANO G, et al. A survey of Web clustering engines [ J ]. ACM Computing Surveys (CSUR), 2009, 41(3) :1-38.
  • 2OSINSKI S, WEISS D. Carrot2: design of a flexible and efficient web information retrieval framework[C]//Proceedings of the 3rd International Atlantic Web Intelligence Conference ( AWIC 2005 ). [ S. l. ] : [ s. n. ], 2005:439-444.
  • 3FERRAGINA P, GULLI A. A personalized search engine based on Web snippet hierarchical clustering [ C ]//Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. New York: ACM Press, 2005: 801-810.
  • 4KOSHMAN S, SPINK A, JANSEN B J. Web searching on the vivisimo search engine[ J]. Journal of the American Society for Information Science and Technology, 2006, 57 (14) : 1875-1887.
  • 5TODA HIROYUKII KATAOKA RYOJI. A search result clustering method using informatively named entities [ C ]//Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, New York: ACM Press, 2005 : 81-86.
  • 6GIANNOTTI F, NANNI M, PEDRESCHI D. Webcat: automatic categorization of Web search results [ C]//Proceedings of the 1 l th Italian Syrup on Advanced Database Systems. Italian: Rubbettino Editore, 2003:507-518.
  • 7LEUSKI A. Evaluating document clustering for interactive information retrieval [ C ]// Proceedings of the tenth international conference on Information and knowledge management. New York: ACM Press, 2001: 33-40.
  • 8李红梅,丁振国,周水生,周利华.基于概念分组的Web搜索结果聚类算法[J].华南理工大学学报(自然科学版),2009,37(1):130-134. 被引量:2
  • 9ZENG H J, HE Q C, CHEN Z, et al. Learning to cluster Web search results[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2004:210-217.
  • 10KUMMAMURU K, LOTLIKAR R, ROY S, et al. A hierarchical monothetic document clustering algorithm for summarization and browsing search results [ C]//Proceedings of the 13th International Conference on World Wide Web. New York: ACM Press, 2004:658-665.

二级参考文献27

  • 1王志梅,张俊林,李秋山.Web检索结果快速聚类方法的研究与实现[J].计算机工程与设计,2004,25(12):2231-2233. 被引量:2
  • 2Hearst M A, Pedersen J O. Reexamining the cluster hypothesis : scatter/gather on retrieval results [ C ]// Proceedings of the 19th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval. Zurich : ACM, 1996:76- 84.
  • 3Giannotti F, Nanni M, Pedreschi D. Webcat : automatic categorization of web search results [ C ]//Proceedings of the 11th Italian Symposium on Advanced Database Systems. New York : ACM,2003:507- 518.
  • 4Zamir O, Etzioni O. Grouper: a dynamic clustering interface to Web search results [ J ]. Computer Networks, 1999,31 ( 1 ) : 1361-1374.
  • 5Zamir O, Etzioni O. Web document clustering:a feasibility demonstration [C]/JProceeding of the 21th Annual International ACM/SIGIR Conference on Research and Development of Information Retrieval. Melbourne : ACM, 1998 : 46-54.
  • 6Zhang D, Dong Y. Semantic, hierarchical, online clustering of Web search results [C]// Proceeding of the 6th Asia Pacific Web Conference. Berlin : Springer,2004:69-78.
  • 7Osinske S, Stefanowski J, Weiss D. Lingo:search results clustering algorithm based on singular value decomposition [C] //Proceeding of the International IIS:Intelligent Information Processing and Web Mining Conference. Berhn:Springer,2004:359-368.
  • 8Veling A, van der Weerd P. Conceptual grouping in word networks [ C ]//Proceeding of the International Joint Conference on Artificial Intelligence. San Francisco : Morgan Kaufmann, 1999:694-699.
  • 9Toda H, Kataoka R. A search result clustering method using informatively named entities [C]//Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management. New York : ACM, 2005 : 81-86.
  • 10Wang Y, Kitsuregawa M. Evaluating contents-link coupled Web page clustering for Web search results [ C ]// Proceedings of the 11th International Conference on Information and Knowledge Management. New York : ACM ,2002 : 499-506.

共引文献22

同被引文献32

  • 1张玥杰,郭依昆,连理,吴立德.基于英汉机译实现跨语言信息检索[J].小型微型计算机系统,2004,25(7):1135-1140. 被引量:10
  • 2张晗,崔雷.运用共词聚类分析法研究生物信息学的学科热点[J].医学情报工作,2004,25(5):327-330. 被引量:46
  • 3张会平,周宁,陈立孚.跨语言信息检索可视化研究[J].情报科学,2007,25(1):134-138. 被引量:10
  • 4Wu D, He D Q, Ji H, et al. A study of using an out-of-box commercial MT system for query translation in CLIR [ C]//Proceeding of the 2nd ACM workshop on Improving non English web searching, Napa Valley, California, USA, October 30-30,2008.
  • 5Joachims T. Optimizing search engines using clickthrough data[ C ]//Proceedings of SIGKDD 2002,2002.
  • 6Shen X H,Tan B,Zhai C X. Contextsensitive information retrieval using implicit feedback [ C ]//Proceedings of The 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ,2005.
  • 7Cai K K, Bu J J, Chen C. An Efficient User-Oriented Clustering of Web Search Re-suits[ C]. ICCS 2005 ,LNCS 3516,2005 : 806-809.
  • 8Osinski S, Stefanowski J, Weiss D. Lingo: search results clustering algorithm based on singular value decompo- sition [ C]//Proceedings of Intelligent Information Syst- ems Conference ,2003.
  • 9Jiang S Y, Song X Y. A clustering-based method for unsupervised intrusion detections [ J ]. Pattern Reco- gnition Letters,2006.5 : 802-810.
  • 10Shen X, Tan B, Zhai C X. Implicit User Modeling for Personalized Search[ C]//Proceedings of the 14th ACM international conference. Bremen, New York, ACM Pre- ss, 2005 : 824-831.

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部