摘要
大数据时代悄然而至,数据质量也引起人们的关注。在提高数据质量方面,很重要的一部分是解决数据不一致性问题。针对大数据情况下的数据不一致问题,本文提出了在MAP-REDUCE框架下的聚类算法。本文在MAP-REDUCE框架下对K-MEDOIDS聚类算法进行了改进,增强了算法的适用性和精确性,并通过仿真实验验证了在大数据环境下该算法的并行性和有效性。
With the arrival of the era of big data, data quality attracts more and more attention recently. An important part of improving data quality is to solve the problem of inconsistency. In this paper, we propose the clustering algorithm based on Map-Reduce to solve the problem of data inconstancy in big data. Moreover, we improve the clustering algorithm named K-MEDOIDS for better applicability and accuracy. At the last, we simulate the experiment on the HADOOP platform. The experiment results evaluate the concurrency and effectiveness of our algorithm in big data.
出处
《微型机与应用》
2015年第15期18-21,25,共5页
Microcomputer & Its Applications