摘要
针对大多基于聚类的离群点检测算法往往需要人工输入参数,对于不同的数据集很难选择一个合适参数的问题,将无参数的基于自然邻居的离群点检测算法的自然邻居搜索算法和密度峰值聚类算法相结合,提出一种基于聚类离群因子和相互密度的离群点检测算法。该算法使用相互密度和γ密度构造决策图,将γ密度异常大的样本点作为聚类中心进行聚类,最后根据聚类的离群因子找出离群聚类边界检测离群点,该算法不需要人工输入参数。在模拟数据集和真实数据集下进行了实验,证明了所提算法能很好地进行聚类和离群数据的挖掘。
Most outlier detection algorithms based on clustering often need to input parameters artificially,which was difficult to select a suitable parameter for different datasets.To solve this problem,an outlier detection algorithm based on cluster outlier factor and mutual density was proposed by combining the natural neighbor search algorithm of NOF algorithm with DPC algorithm.The mutual density andγdensity was used to construct decision graph,and the data points with gamma-density anomalously large in decision graph were treated as cluster centers.According to the Cluster Outlier Factor(COF),the boundary of outlier cluster was detected to find the parameter automatically.The experiments showed that the proposed method could achieve good performance in clustering and outlier detection.
作者
张忠平
邱敬仰
刘丛
朱梦凡
章德斌
ZHANG Zhongping;QIU Jingyang;LIU Cong;ZHU Mengfan;ZHANG Debin(School of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China;The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Qinhuangdao 066004,China;Hebei Education Examinations Authority,Shijiazhuang 050000,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2019年第9期2314-2323,共10页
Computer Integrated Manufacturing Systems