摘要
针对密度峰值聚类算法需要人工设置参数、时间复杂度高的问题,提出了基于快速密度峰值聚类离群因子的离群点检测算法。首先,使用k近邻算法代替密度峰值聚类中的密度估计,采用KD-Tree索引数据结构计算数据对象的k近邻;然后,采用密度和距离乘积的方式自动选取聚类中心。此外,定义了向心相对距离、快速密度峰值聚类离群因子来刻画数据对象的离群程度。在人工数据集和真实数据集上对所提算法进行实验验证,并与一些经典和新颖的算法进行对比实验,从正确性和时间效率上验证了所提算法的有效性。
For the problem that peak density clustering algorithm requires human set parameters and high time complexity,an outlier detection algorithm based on fast density peak clustering outlier factor was proposed.Firstly,k nearest neighbors algorithm was used to replace the density peak of density estimate,which adopted the KD-Tree index data structure calculation of k close neighbors of data objects,and then the way of the product of density and distance was adopted to automatic selection of clustering centers.In addition,the centripetal relative distance and fast density peak clustering outliers were defined to describe the degree of outliers of data objects.Experiments on artificial data sets and real data sets were carried out to verify the algorithm,and compared with some classical and novel algorithms.The validity and time efficiency of the proposed algorithm are verified.
作者
张忠平
李森
刘伟雄
刘书霞
ZHANG Zhongping;LI Sen;LIU Weixiong;LIU Shuxia(College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China;The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Qinhuangdao 066004,China;The Key Laboratory of Software Engineering of Hebei Province,Qinhuangdao 066004,China;Hebei Normal University of Science and Technology,Qinhuangdao 066004,China)
出处
《通信学报》
EI
CSCD
北大核心
2022年第10期186-195,共10页
Journal on Communications
基金
国家自然科学基金资助项目(No.61972334)
国家社会科学基金资助项目(No.20BJ122)
河北省创新能力提升计划基金资助项目(No.20557640D)
四达铁路智能图像工件识别基金资助项目(No.x2021134)。
关键词
数据挖掘
密度峰值聚类
离群点
K近邻
向心相对距离
data mining
density peak clustering
outlier
k nearest neighbor
centripetal relative distance