期刊文献+

基于映射距离比离群因子的离群点检测算法

Outlier detection algorithm based on mapping distance ratio outlier factor
下载PDF
导出
摘要 针对基于邻近性的离群点检测方法需要花费大量时间过滤正常点,并且在检测全局离群点时难以检测出局部离群点的问题,提出一种基于映射距离比离群因子离群点检测(MDROF)算法。首先,为了减少正常点在检测过程中的时间消耗,给出了差异相似度的概念,通过定义差异相似度剪枝因子过滤掉数据集中的大部分正常点。其次,定义映射k距离,通过映射距离与可达距离的比值刻画数据对象的局部离群程度,通过可达密度刻画数据对象的全局离群程度。最后,结合数据对象相互近邻点的平均排位定义映射距离比离群因子来检测离群点。在人工数据集以及真实数据集上分别对该算法与其他经典的离群点检测算法在精确率、AUC值和离群点发现曲线上进行实验对比分析。实验结果证明MDROF算法在离群点检测的准确性和稳定性上明显优于对比算法。 To solve the problem that the outlier detection method based on proximity needs a lot of time to filter normal points,and it is difficult to detect local outliers when detecting global outliers,an outlier detection algorithm based on Mapping Distance Ratio Outlier Factor(MDROF)was proposed.To reduce the time consumption of normal points in the detection process,the concept of difference similarity was given,and most normal points in the data set were filtered out by defining the difference similarity pruning factor.The mapping k distance was defined,and the local outlier degree of the data object was described by the ratio of the mapping distance to the reachable distance,and the global outlier degree was described by the reachable density.The mapping distance ratio outlier factor was defined by combining the average rank of the nearest neighbors of the data objects to detect outliers.The accuracy,AUC value and outlier detection curve of the proposed algorithm were compared with other classical outlier detection algorithms on the artificial data set and the real data set.The experimental results showed that MDROF was superior to the comparison algorithms in the accuracy and stability of outlier detection.
作者 张忠平 姚春辰 孙光旭 刘硕 张睿博 魏永辉 ZHANG Zhongping;YAO Chunchen;SUN Guangxu;LIU Shuo;ZHANG Ruibo;WEI Yonghui(College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China;Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Qinhuangdao 066004,China;School of International Education,Wuhan University of Technology,Wuhan 430070,China;Liren College,Yanshan University,Qinhuangdao 066004,China;School of Information and Communication Technology,Mongolian University of Science and Technology,Ulan Bator 627153,Mongolia)
出处 《计算机集成制造系统》 EI CSCD 北大核心 2024年第5期1719-1732,共14页 Computer Integrated Manufacturing Systems
基金 国家自然科学基金资助项目(61972334) 河北省创新能力提升计划基金资助项目(222567626H) 中央引导地方科技发展资金资助项目(226Z1707G) 四达铁路智能图像工件识别基金资助项目(x2021134) 秦皇岛城发健康产业发展有限公司绩效考核管理系统资助项目(x2022247)。
关键词 数据挖掘 离群点检测 差异相似度剪枝 映射k距离 映射距离比 data mining outlier detection difference similarity pruning mapping k distance mapping distance ratio
  • 相关文献

参考文献6

二级参考文献80

  • 1马少沛,孙庆慧,武雅萱,田茂再.大数据下张量充分降维方法及其应用研究[J].统计研究,2021,38(2):114-134. 被引量:3
  • 2刘靖明,韩丽川,侯立文.基于粒子群的K均值聚类算法[J].系统工程理论与实践,2005,25(6):54-58. 被引量:122
  • 3倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643. 被引量:20
  • 4薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 5Cover T M,Thomas J A,阮吉寿,等.信息论基础[M].北京:机械工业出版社,2005.348-354.
  • 6HANJW,KAMBERM.数据挖掘:概念与技术[M].范明,盂小峰译.北京:机械工业出版社.2007.
  • 7Gogoi P, Borah B, Bhattaeharyya D K. Outlier identification using symmetric neighborhoodsJ]. Procedia Technology, 2012, 6 239-246.
  • 8Breunig M M, Kriegel H P, etal. IX)F:identifying density-based local outliers[-J. Proc. of 2000 ACM SIGMOD international conference on Management of data. ACM Sigmod Record, 2000, 29(2):93-104.
  • 9Hautamaki V, Karkkainen I. Outlier detection using k-nearest neighbor graphiC]//Proc. 17th IEEE Int. Conf. on Pattern Rec- ognition. 2004,3 : 430-433.
  • 10Angiulli, F, Palopoli L. Detecting outlying properties of excep- tional objects E J']. ACM Transaction on Database Systems, 2009,34(1):62-74.

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部