期刊文献+

面向位置大数据的快速密度聚类算法 被引量:32

Fast Density-Based Clustering Algorithm for Location Big Data
下载PDF
导出
摘要 面向位置大数据聚类,提出了一种简单但高效的快速密度聚类算法CBSCAN,以快速发现位置大数据中任意形状的聚类簇模式和噪声.首先,定义了Cell网格概念,并提出了基于Cell的距离分析理论,利用该距离分析,无需距离计算,可快速确定高密度区域的核心点和密度相连关系;其次,给出了网格簇定义,将基于位置点的密度簇映射成基于网格的密度簇,利用排他网格与相邻网格的密度关系,可快速确定网格簇的包含网格;第三,利用基于Cell的距离分析理论和网格簇概念,实现了一个快速密度聚类算法,将DBSCAN基于数据点的密度扩展聚类转换成基于Cell的密度扩展聚类,极大地减少高密度区域的距离计算,利用位置数据的内在特性提高了聚类效率;最后,在基准测试数据上验证了所提算法的聚类效果,在位置大数据上的实验结果统计显示,与DBSCAN、PR-Tree索引和Grid索引优化的DBSCAN相比,CBSCAN分别平均提升了525倍、30倍和11倍效率. This paper proposes a simple but efficient density-based clustering, named CBSCAN, to fast discover cluster patterns with arbitrary shapes and noises from location big data effectively. Firstly, the notion of Cell is defined and a distance analysis principle based on Cell is proposed to quickly find core points in high density areas and density relationships with other points without distance computing. Secondly, a Cell-based cluster that maps point-based density cluster to grid-based density cluster is presented. By leveraging exclusion grids and relationships with their adjacent grids, all inclusion grids of Cell-based cluster can be rapidly determined. Furthermore a fast density-based algorithm based on the distance analysis principle and Cell-base cluster is implemented to transform DBSCAN of point-based expansion to Cell-based expansion clustering. The proposed algorithm improves clustering efficiency significantly by using inherent property of location data to reduce huge number of distance calculations. Finally, comprehensive experiments on benchmark datasets demonstrate the clustering effectiveness of the proposed algorithm. Experimental results on massive-scale real and synthetic location datasets show that CBSCAN improves 525 fold, 30 fold and 11 fold of efficiency compared with DBSCAN, DBSCAN with PR-Tree and Grid index optimization respectively.
作者 于彦伟 贾召飞 曹磊 赵金东 刘兆伟 刘惊雷 YU Yan-Wei;JIA Zhao-Fei;CAO Lei;ZHAO Jin-Dong;LIU Zhao-Wei;LIU Jing-Lei(School of Computer and Control Engineering,Yantai University,Yantai 264005,China;Computer Science and Artificial Intelligence Laboratory,Massachusetts institute of Technology,Cambridge 02139,USA)
出处 《软件学报》 EI CSCD 北大核心 2018年第8期2470-2484,共15页 Journal of Software
基金 国家自然科学基金(61403328 61773331 61572419 61502410) 山东省重点研发计划(2015GSF115009) 山东省自然科学基金(ZR2013FM011 ZR2013FQ023 ZR2014FQ016)~~
关键词 聚类分析 密度聚类 位置大数据 Cell网格 网格簇 clustering analysis density-based clustering location big data Cell grid cell-based cluster
  • 相关文献

参考文献5

二级参考文献55

  • 1刘经南.泛在测绘与泛在定位的概念与发展[J].数字通信世界,2011(S1):28-30. 被引量:31
  • 2吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 3陈卓,孟庆春,魏振钢,任丽婕,窦金凤.一种基于网格和密度凝聚点的快速聚类算法[J].哈尔滨工业大学学报,2005,37(12):1654-1657. 被引量:14
  • 4朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:50
  • 5陆锋 段滢滢 袁文.LBS的数据处理技术[J].中国计算机学会通讯,2010,.
  • 6Guha S, Meyerson A, Mishra N, Motwani R, O'Callaghan L. Clustering data streams: theory and practice. IEEE Trans-actions on Knowledge and Data Engineering, 2003, 15(3): 515-528.
  • 7Han J W, Kamber M. Data Mining Concepts and Tech- niques. Beijing: China Machine Press, 2006. 196-211.
  • 8Ester M, Kriegel H P, Sander J, Xu X W. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Confer- ence on Knowledge Discovery and Data Mining. Portland, USA: AAAI Press, 1996. 226-231.
  • 9Sander J, Ester M, Kriegel H P, Xu X W. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discov- ery, 1998, 2(2): 169-194.
  • 10Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th International Conference on Knowledge Discov- ery and Data Mining. New York, USA: AAAI Press, 1998. 58-65.

共引文献238

同被引文献277

引证文献32

二级引证文献106

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部