摘要
基于密度的DBSCAN聚类算法可以识别任意形状簇,但存在全局参数Eps与Min Pts的选择需人工干预,采用的区域查询方式过程复杂且易丢失对象等问题,提出了一种改进的参数自适应以及区域快速查询的密度聚类算法。根据KNN分布与数学统计分析自适应计算出最优全局参数Eps与Min Pts,避免聚类过程中的人工干预,实现了聚类过程的全自动化。通过改进种子代表对象选取方式进行区域查询,无需漏检操作,有效提高了聚类的效率。对4种典型数据集的密度聚类实验结果表明,本文算法使得聚类精度提高了8.825%,聚类的平均时间减少了0.92 s。
The density-based DBSCAN clustering algorithm can identify clusters with arbitrary shape,however,the choice of the global parameters Eps and Min Pts requires manual intervention,the process of regional query is complex and loses objects easily. Therefore,an improved density clustering algorithm with adaptive parameter for fast regional queries is proposed. Using KNN distribution and mathematical statistical analysis,the optimal global parameters Eps and Min Pts are adaptively calculated,so as to avoid manual intervention and enable full automation of the clustering process. The regional query is conducted by improving the selection manner of the object,which is represented by a seed and thus avoiding manual intervention,and so the clustering efficiency is effectively increased. The experiment results looking at density clustering of four typical data sets show that the proposed method effectively improves clustering accuracy by 8.825% and reduces the average time of clustering by 0.92 s.
出处
《智能系统学报》
CSCD
北大核心
2016年第1期93-98,共6页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金资助项目(61373126)
江苏省产学研联合创新资金-前瞻性联合研究基金资助项目(BY2013015-33)
关键词
密度聚类
DBSCAN
区域查询
全局参数
KNN分布
数学统计分析
density clustering
DBSCAN
region query
global parameters
KNN distribution
mathematical statis tics and analysis