Density-based algorithm for discovering clusters in large spatial databases with noise(DBSCAN) is a classic kind of density-based spatial clustering algorithm and is widely applied in several aspects due to good perfo...Density-based algorithm for discovering clusters in large spatial databases with noise(DBSCAN) is a classic kind of density-based spatial clustering algorithm and is widely applied in several aspects due to good performance in capturing arbitrary shapes and detecting outliers. However, in practice, datasets are always too massive to fit the serial DBSCAN. And a new parallel algorithm-Parallel DBSCAN(PDBSCAN) was proposed to solve the problem which DBSCAN faced. The proposed parallel algorithm bases on MapReduce mechanism. The usage of parallel mechanism in the algorithm focuses on region query and candidate queue processing which needed substantive computation resources. As a result, PDBSCAN is scalable for large-scale dataset clustering and is extremely suitable for applications in E-Commence, especially for recommendation.展开更多
基金National Natural Science Foundations of China( No. 61070101,No. 60875029,No. 61175048)
文摘Density-based algorithm for discovering clusters in large spatial databases with noise(DBSCAN) is a classic kind of density-based spatial clustering algorithm and is widely applied in several aspects due to good performance in capturing arbitrary shapes and detecting outliers. However, in practice, datasets are always too massive to fit the serial DBSCAN. And a new parallel algorithm-Parallel DBSCAN(PDBSCAN) was proposed to solve the problem which DBSCAN faced. The proposed parallel algorithm bases on MapReduce mechanism. The usage of parallel mechanism in the algorithm focuses on region query and candidate queue processing which needed substantive computation resources. As a result, PDBSCAN is scalable for large-scale dataset clustering and is extremely suitable for applications in E-Commence, especially for recommendation.