摘要
在科技高速发展的今天,海量数据处理问题受到人们广泛关注。将K-means聚类算法与Hadoop平台相结合是处理海量数据问题的一条可靠途径。简单介绍Hadoop和K-means算法以及K-means聚类算法MapReduce并行化实现,并阐述目前Hadoop平台下K-means算法的几种优化方式,最后提出研究展望。
Today, with the rapid development of science and technology, more and more people pay attention to the prob- lem of massive data processing. The combination of K-means clustering algorithm and Hadoop platform is a reliable way to deal with massive data problems. In this paper, we do a brief introduction about Hadoop and K-means algorithm and paral- lel implementation of K-means clustering algorithm based on MapReduce. At the same time, we do a introduction and elab- oration about several optimization methods of K-means algorithm based on Hadoop platform . Finally, the future research directions are discussed.
出处
《软件导刊》
2017年第6期208-211,共4页
Software Guide
基金
国家自然科学基金项目(61472256
61170277)
上海市教委科研创新重点项目(12zz137)
沪江基金项目(C14002)