摘要
DBSCAN算法是一种基于密度的快速聚类算法,虽然在处理大规模数据时可以发现其中的噪声数据,但聚类效率不高,输入/输出消耗大,聚类结果准确率低。本文在云计算平台Hadoop环境下,将MapReduce编程模型的高并行性引入该算法,设计出一种并行DBSCAN算法,提高传统DBSCAN算法的执行效率,通过对比实验结果证明了该算法聚类的准确性和时效性。
DBSCAN algorithm is a density-based fast clustering algorithm. Although the noise data can be found when dealing with large-scale data,the clustering efficiency is not high,the input/output consumption is large and the accuracy of clustering results is low. In this paper,the parallelism of the MapReduce programming model is introduced into the Hadoop environment,and a parallel DBSCAN algorithm is designed to improve the efficiency of the traditional DBSCAN algorithm. The accuracy of the algorithm is proved by comparing the experimental results and timeliness.
出处
《山西电子技术》
2017年第6期87-90,共4页
Shanxi Electronic Technology