摘要
基于经典流聚类框架CluStream和密度聚类算法DBSCAN,提出了一种分布式实时数据流密度聚类算法DBS-Stream,并在Storm流式处理平台上设计了算法实现方案.该算法局部节点使用CluStream的两段式经典框架,在线微聚类中利用DBSCAN代替K-means初始化数据,在中心节点再使用DBSCAN算法进行全局聚类.该算法可解决任意型聚类问题,并可使局部节点快速更新数据.将DBS-Stream算法与CluStream算法进行比较,实验结果表明,本研究算法在聚类质量和通信代价方面均优于CluStream.
A distributed real-time data flow density clustering algorithm DBS-Stream is proposed on the platform of Storm,based on the classic flow clustering framework CluStream and density clustering algorithm DBSCAN. The local site of the algorithm adopts a two-stage classic frame and replaces the K-means initialization data with DBSCAN within the online micro clustering. And the center site makes use of the DBSCAN algorithm to realize the global clustering. The algorithm can solve any problem of clustering,and update the data of local site quickly. Compared with the CluStream,the experiment results show that the algorithm DBS-Stream has better performance on the clustering accuracy and communication cost.
作者
牛丽媛
张桂芸
NIU Liyuan;ZHANG Guiyun(College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, Chin)
出处
《天津师范大学学报(自然科学版)》
CAS
北大核心
2018年第3期72-76,共5页
Journal of Tianjin Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(61572358)
天津市自然科学基金资助项目(16JCYBJC23600)