摘要
在竞争激烈的市场环境下,为了更好分析商业用户信息,赢得更多的商业用户,需要进行海量大数据分析。本文针对传统K-means算法自身初始聚类选取的缺陷和单机串行聚类算法的局限性,提出了一种改进的K-means聚类算法。结合当前主流的开源云计算平台Hadoop,把改进的算法并行化,克服了传统串行聚类算法在海量数据处理时的不足,以某大型网络存储服务企业每日商业用户网络存储资源使用量为实验数据,验证了算法的高效性和可行性。
In the highly competitive market environment,in order to better analyze the commercial user information and win more commercial users,it is necessary to carry out mass data analysis.In this paper,we propose an improved K-means clustering algorithm based on the limitations of the traditional K-means algorithm and the limitations of single machine serial clustering algorithm.Combined with the current mainstream cloud computing platform Hadoop,the improved algorithm is paralleled,which overcomes the shortcomings of the traditional serial clustering algorithm in mass data processing.Use the large network storage service enterprise daily business user network storage resource as the experimental data,the effectiveness and feasibility of the algorithm is proved.
出处
《河北联合大学学报(自然科学版)》
CAS
2016年第1期67-71,共5页
Journal of Hebei Polytechnic University:Social Science Edition
基金
河北省自然科学基金(ZD2014077)