期刊文献+

一种基于Nutch的网页聚类系统的设计与实现 被引量:3

Design and implementation on Web clustering system based on Nutch
下载PDF
导出
摘要 设计了一种在中英文环境下、能够对Nutch的搜索结果进行聚类处理的搜索结果聚类系统,该系统基于k-means算法和后缀树聚类算法,是一个由Nutch搜索引擎、文本分词、TF-IDF权重计算以及文本聚类等模块构成的搜索引擎结果文档聚类系统,并通过实验对k-means算法和后缀树算法进行了对比。 A search results clustering system which can be able to search cluster results obtained from Nutch is designed both in English and Chinese language environment.This system is based on k-means algorithm and suffix tree clustering algorithm and is made of Nutch module,TF-IDF weight calculation module and text clustering module.The k-means algorithm and suffix tree clustering algorithm are contrasted based on the experiments.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第5期118-122,共5页 Computer Engineering and Applications
关键词 NUTCH 聚类 K-MEANS 后缀树 Nutch clustering k-means suffix tree
  • 相关文献

参考文献4

  • 1HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 2李红梅,丁振国,周水生,周利华.搜索引擎中的聚类浏览技术[J].中文信息学报,2008,22(3):56-63. 被引量:9
  • 3Hotho A,Numberger A,Paab G.A brief survey of text mining[J]. GLDV-Journal for Computational Linguistics and Language Technology, 2005,20.
  • 4Ifrim G, Theobald M,Weilmm G.Leaming word-to-concept map- pings for automatic text classification[C]//ICML Workshop on Learning in Web Search,2005.

二级参考文献41

  • 1黄健斌,姬红兵.基于模糊概念格的Web搜索结果聚类算法[J].西安电子科技大学学报,2005,32(6):856-860. 被引量:6
  • 2刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 3宋春芳,石冰.一种基于关联规则的搜索引擎结果聚类算法[J].山东大学学报(理学版),2006,41(3):68-72. 被引量:5
  • 4Pretschner A, Gauch S. Ontology Based Personalized Search[A]. In: Proceedings of the Eleventh IEEE International Conference on Tools with Artificial Intelligence[C]. 1999:391-398.
  • 5Jansen B J, Spink A, Bateman J, Saracevic T. Real Life Information Retrieval: A Study of User Queries on the Web[J]. ACM SIGIR Forum, 1998, 32(1) : 5- 17.
  • 6Franzen K, Karlgren J. Verbosity and Interface Design[A]. Technical Report T2000:04, Swedish Institute of Computer Science(SICS)[C]. 2000.
  • 7Chen H, Dumais S. Bringing Order to the Web:Automatically Categorizing Search Results [A]. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems[C]. New York: ACM Press, 2000. 145-152.
  • 8Kules B, Kustanowitz J, Shneiderman B. Categorizing Web Search Results into Meaningful and Stable Categories Using Fast-Feature Techniques [A]. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries [C]. New York: ACM Press, 2006. 210-219.
  • 9Cui H, Zalane O R. Hierarchical Structural Approach to Improving the Browsability of Web Search Engine Results[J]. IEEE, 2001,956-960.
  • 10Griffiths A, Luchhurst H, Willett P. Using Inter-Document Similarity Information in Document Retrieval Systems[J]. Journal of the American Society for Information Sciences, 1986,37:3-11.

共引文献156

同被引文献34

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部