期刊文献+

基于MapReduce和并行遗传算法的大数据聚类问题研究

Big Data Clustering Problem Based on MapReduce and Parallel Genetic Algorithm
下载PDF
导出
摘要 聚类是将不同对象的集合分割为由相似对象组成的多个不同类的过程,是最重要的数据挖掘技术之一.然而,对于大数据聚类却是一个复杂的问题.由于大数据体量庞大,聚类算法时间消耗巨大.并行是解决算力不足的一个非常好的方法.据此,本文采用了Hadoop平台上的MapReduce来实现大规模数据集的并行运算,将大数据聚类问题的时间复杂度限制到一个可以接受的范围内.最后本文从时间消耗和聚类精确度方面对该方法的性能收益进行了评估,在保证较高精确度的同时大大提高了运算速度. Clustering is one of the most important techniques in data mining, which is based on the many different processes that are composed of similar objects. However, for big data clustering is a complex problem. Because of the huge amount of data,the clustering algorithm is time-consuming. Parallel is a very good method to solve the problem of insufficient force. Based on this, Hadoop MapReduce is used to achieve the parallel operation of big data sets. The time complexity of big data clustering problem is limited to an acceptable range. At last, the performance gains of the method are evaluated from the time consumption and clustering accuracy, which can greatly improve the running speed.
作者 郭晨晨 朱红康 GUO Chenchen ZHU Hongkang(School of Mathematics and Computer Science, Shanxi Normal University, Linfen 041000, China)
出处 《鲁东大学学报(自然科学版)》 2017年第1期31-35,共5页 Journal of Ludong University:Natural Science Edition
基金 山西省自然科学基金(2015011040)
关键词 大数据 MAPREDUCE 数据挖掘 并行遗传算法 聚类 big data MapReduce data mining parallel genetic algorithm clustering
  • 相关文献

参考文献11

二级参考文献172

共引文献552

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部