期刊文献+

分布式K-means聚类在微博热点主题发现的应用 被引量:8

Application of Distributed K-Means Clustering Algorithm in Micro-Blog Hot Topic Discovery
下载PDF
导出
摘要 随着互联网的飞速发展,微博已经成为一个拥有大量信息和复杂数据的社交媒体网络,这使得对于发现网络舆情面临巨大的挑战。改进了一种基于MapReduce的并行化K-means划分聚类算法,并针对K-means(K均值)算法初始聚类中心难以选取的缺点,将Isodata(迭代自组织分析算法)算法得到的K值,作为K-means算法的初始聚类中心,提高聚类的精度。最后将改进的K-means算法用于微博热点主题发现中,通过与传统的K-means算法比较,证明了改进算法能有效提高聚类的精度,而且在处理海量数据时有较大优势。 With the rapid development of the Internet,micro-blog has become a social media network with a large amount of information and complex data,which makes it a great challenge to find public opinion on the Internet.In this paper,a parallel k-means partitioning clustering algorithm based on MapReduce was improved.To overcome the disadvantage that the initial clustering center of K-means algorithm is difficult to select,the K value obtained by Iterative Self-Organizing Analysis(Isodata)algorithm was used as the initial clustering center of K-means algorithm to improve the clustering accuracy.Experimental results on the micro-blog hot topic show that the proposed algorithm performs favorably against traditional K-means algorithm in terms of clustering precision and massive data problem.
作者 王林 许郡蒙 WANG Lin;XU Jun-meng(College of Automation,Xi'an University of Technology,Xi'an Shanxi 710048,China)
出处 《计算机仿真》 北大核心 2020年第8期121-125,共5页 Computer Simulation
基金 陕西省科学技术厅重点研发计划(2017ZDCXL-GY-05-03)。
关键词 划分聚类 热点话题 并行化 改进划分聚类算法 Partition clustering Hot topic Parallelization Improved partition clustering algorithm
  • 相关文献

参考文献4

二级参考文献31

  • 1冯兴杰,黄亚楼.增量式CURE聚类算法研究[J].小型微型计算机系统,2004,25(10):1847-1849. 被引量:9
  • 2郭俊,樊彦国.一种改进的CURE聚类算法[J].内蒙古石油化工,2005,31(8):12-15. 被引量:4
  • 3GALE LD. A sequential algorithm for training text classifiers [ J]. In Proceedings of ACM SIGIR Conference, 1994.
  • 4CRAVEN M , FREITAG D , et al. Learning to extract symbolic knowledge from the World Wide Web. Technical Report[ R], School of Computer Science, CMU. 1998.
  • 5PAZZANI MJ, MURAMATSU J, et al. Syskill and Webert: Identifying interesting Web sites [J]. In AAAI-96. 1996.
  • 6DUBES RC, JAIN AK. Algorithms for Clustering Data [ M]. Prentice Hall, 1988.
  • 7SALTON G, WONG A, YANG CS. A Vector Space Model for Automatic Indexing [ J]. Communication of the ACM, 1975, 18(5):613-620.
  • 8LARSEN B, AONE C. Fast and effective text mining using lineartime document clustering [ A]. In Proc. of the Fifth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, 1999.16 -22.
  • 9LEWIS DD. Reuters - 2 1 5 7 8 text categorization text collection 1 . 0[ DB/OL]. http:∥www. daviddlewis. co m/resources/testcollections/reuters21578/
  • 10HAN S, BOLEY D, GINI D, et al. WebAce: A Web Agent for Document Categorization and Exploration [ J]. Proceedings of the 2nd International Conference on Autonomous Agents (Agents'98).

共引文献76

同被引文献68

引证文献8

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部