期刊文献+

基于MapReduce的人工蜂群算法在大数据中的应用 被引量:3

Application Research of A MapReduce-based Artificial Bee Colony for Large-scale Data Clustering
下载PDF
导出
摘要 随着信息技术的不断进步,数据规模不断增大。聚类是一种典型的数据分析方法,尤其是对大规模数据进行聚类分析近年来备受关注。针对现有序列聚类算法在对大规模数据进行聚类时,在内存空间和计算时间方面开销较大的问题,提出了基于MapReduce的人工蜂群聚类算法,通过引入MapReduce并行编程范式,快速计算聚类中心适应度,可实现对大规模数据的高效聚类。基于仿真和真实的磁盘驱动器制造两类数据,对算法的聚类效果、可扩展性和聚类效率进行了验证。实验结果表明,与现有PK-Means算法和并行K-PSO算法相比,论文算法具有更好的聚类效果、更强的扩展性和更高的聚类效率。 With the development of information technology,the scale of digital data is increasing.Clustering is a typical data analysis technology for large-scale data.In recent years,the clustering technology is increasingly concerned.The computational cost of most sequential clustering algorithms is expensive in terms of memory space and the time complexities.In this paper,an improved artificial bee colony based on MapReduce for large-scale data clustering is proposed.The MapReduce programming paradigm is in troduced in this algorithm to calculate the fitness.The quality,scalability and efficiency of the proposed algorithm are tested by us ing two datasets,the synthetic dataset and the manufacturing dataset obtained from a disk drive manufacturing process.Experimen tal results show that this algorithm performs better in clustering effect,s calability and computational efficiency compared with PK-Means and parallel K-PSO.
作者 李果 袁小凯 许爱东 张乾坤 张福铮 LI Guo;YUAN Xiaokai;XU Aidong;ZHANG Qiankun;ZHANG Fuzheng(Southern Power Grid Institute of Science,Guangzhou 510080)
出处 《计算机与数字工程》 2020年第1期124-129,146,共7页 Computer & Digital Engineering
基金 国家自然科学基金项目(编号:61672393)资助
关键词 大数据 MAPREDUCE 人工蜂群 聚类 并行编程范式 large-scale datasets MapReduce artificial bee colony lustering parallel programming paradigm
  • 相关文献

参考文献5

二级参考文献44

  • 1姜传菊.网络日志分析在网络安全中的作用[J].现代图书情报技术,2004(12):58-60. 被引量:19
  • 2Han Jiawei,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.2版.北京:机械工业出版社,2007.
  • 3Xu Xiaowei, Yuruk N, Feng Zhidan. SCAN: A Structural Clustering Algorithm for Networks[C]//Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2007.
  • 4Johnson H O, Teter A J. News and Announcements Changes: Share Your News Online with the Chemical Education Com- munity[J]. Journal of Chemical Education, 2012, 89(1): 12.
  • 5Wooldridge J M. Cluster-sample Methods in Applied Econo- metrics[C]//Proc. of the American Economic Review. Washington D. C., USA: American Economic Association, 2003: 133-138.
  • 6Younis O, Fahmy S. HEED: A Hybrid, Energy-efficient, Distri- buted Clustering Approach for Ad Hoc Sensor Networks[J]. IEEE Transactions on Mobile Computing, 2004, 3(4): 366- 379.
  • 7Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Cluster[J]. Communications of the ACM, 2005, 51(1): 107-113.
  • 8Lee K, Lee Y, Choi H. Parallel Data Processing with Map- Reduce: A Survey[J]. ACM SIGMOD Record, 2011, 40(4): 11-20.
  • 9Newman M E J. Networks: An Introduction[M]. Oxford, UK: Oxford University Press, 2010.
  • 10Ambrosini E, Aloisi F. Chemokines and Glial Cells: A Complex Network in the Central Nervous System[J]. Neurochemical Research, 2004, 29(5): 1017-1038.

共引文献61

同被引文献31

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部