摘要
针对基于邻近性的离群点检测方法需要花费大量时间过滤正常点,并且在检测全局离群点时难以检测出局部离群点的问题,提出一种基于映射距离比离群因子离群点检测(MDROF)算法。首先,为了减少正常点在检测过程中的时间消耗,给出了差异相似度的概念,通过定义差异相似度剪枝因子过滤掉数据集中的大部分正常点。其次,定义映射k距离,通过映射距离与可达距离的比值刻画数据对象的局部离群程度,通过可达密度刻画数据对象的全局离群程度。最后,结合数据对象相互近邻点的平均排位定义映射距离比离群因子来检测离群点。在人工数据集以及真实数据集上分别对该算法与其他经典的离群点检测算法在精确率、AUC值和离群点发现曲线上进行实验对比分析。实验结果证明MDROF算法在离群点检测的准确性和稳定性上明显优于对比算法。
To solve the problem that the outlier detection method based on proximity needs a lot of time to filter normal points,and it is difficult to detect local outliers when detecting global outliers,an outlier detection algorithm based on Mapping Distance Ratio Outlier Factor(MDROF)was proposed.To reduce the time consumption of normal points in the detection process,the concept of difference similarity was given,and most normal points in the data set were filtered out by defining the difference similarity pruning factor.The mapping k distance was defined,and the local outlier degree of the data object was described by the ratio of the mapping distance to the reachable distance,and the global outlier degree was described by the reachable density.The mapping distance ratio outlier factor was defined by combining the average rank of the nearest neighbors of the data objects to detect outliers.The accuracy,AUC value and outlier detection curve of the proposed algorithm were compared with other classical outlier detection algorithms on the artificial data set and the real data set.The experimental results showed that MDROF was superior to the comparison algorithms in the accuracy and stability of outlier detection.
作者
张忠平
姚春辰
孙光旭
刘硕
张睿博
魏永辉
ZHANG Zhongping;YAO Chunchen;SUN Guangxu;LIU Shuo;ZHANG Ruibo;WEI Yonghui(College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China;Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Qinhuangdao 066004,China;School of International Education,Wuhan University of Technology,Wuhan 430070,China;Liren College,Yanshan University,Qinhuangdao 066004,China;School of Information and Communication Technology,Mongolian University of Science and Technology,Ulan Bator 627153,Mongolia)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2024年第5期1719-1732,共14页
Computer Integrated Manufacturing Systems
基金
国家自然科学基金资助项目(61972334)
河北省创新能力提升计划基金资助项目(222567626H)
中央引导地方科技发展资金资助项目(226Z1707G)
四达铁路智能图像工件识别基金资助项目(x2021134)
秦皇岛城发健康产业发展有限公司绩效考核管理系统资助项目(x2022247)。
关键词
数据挖掘
离群点检测
差异相似度剪枝
映射k距离
映射距离比
data mining
outlier detection
difference similarity pruning
mapping k distance
mapping distance ratio