基于相似孤立系数的孤立点检测算法被引量：4

Outlier Detection Algorithm Based on Approximate Outlier Factor

下载PDF

导出

摘要基于聚类的孤立点检测算法得到的结果比较粗糙,不够准确。针对该问题,提出一种基于相似孤立系数的孤立点检测算法。定义相似距离以及相似孤立点系数,给出基于相似距离的剪枝策略,根据该策略缩小可疑孤立点候选集,并降低孤立点检测算法的计算复杂度。通过选用公共数据集Iris、Labor和Segment-test进行实验验证,结果表明,该算法在发现孤立点、缩小候选集等方面相比经典孤立点检测算法更有效。 Aiming at the problem that the result of outlier detection algorithm based on clustering is coarser and not very accurate, this paper proposes an outlier detection algorithm based on Approximate Outlier Factor（AOF）. This algorithm presents the definition of the similarity distance and outlier similarity coefficient, and provides a pruning strategy based on similarity distance to reduce the suspect candidate sets to decrease the computational complexity. Experiments are carried out with public datasets Iris, Labor and Segment-test, and results show that the performance of detecting outlier and reducing candidate set of this algorithm is effective compared with the classical outlier detection algorithm.

作者谢岳山樊晓平廖志芳周国恩刘世杰

机构地区中南大学信息科学与工程学院中南大学软件学院

出处《计算机工程》 CAS CSCD 2013年第11期200-204,共5页 Computer Engineering

基金国家科技支撑计划基金资助项目(2012BAH08B01) 湖南省自然科学基金资助项目(12JJ3074)

关键词聚类孤立点孤立点检测相似孤立系数剪枝策略孤立点候选集 clustering outlier outlier detection Approximate Outlier Factor（AOF） pruning strategy outlier candidate set

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献79

1薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量：96
2Breuning M M, Kriegel H P, Ng R T. LOF: Identifying Density-based Local Outlier[C]//Proc. of ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2000.
3Zhang Yue, Yang Xuehua, Li Huang. An Outlier Mining Algorithm Based on Confidence Interval[C]//Proc. of the 2nd IEEE International Conference on Information Management and Engineering[S. l.]: IEEE Press, 2010.
4Knorr E M, Ng R T. Finding Intentional Knowledge of Distance-based Outliers[C]//Proc. of the 25th International Conference on Very Large Data Bases. Edinburgh, UK: [s. n.], 1999.
5Wei Huang, Wu Di, Ren Jiadong. An Outlier Mining Algori- thm in High-dimention Based on Single-parament-k Local Density[C]//Proc. of the 4th International Conference on Innovative Computing[S. l.]: IEEE Press, 2009.
6李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量：28
7University of California, Irvine. UCI Machine Learning Repo- sitory[EB/OL]. (2010-11-21). http://archive.ics.uci.edu/ml/ datasets.
8刘洪涛,童德利,陈世福.一种基于属性的异常点检测算法[J].计算机科学,2005,32(5):164-166. 被引量：4
9Ren Jiadong, Wu Qunhui, Zhang Jia. Efficient Outlier Detec- tion Algorithm for Heterogeneous Data Streams[C]//Proc. of the 6th International Conference on Fuzzy Systems and Knowledge Discovery. Tianjin, China: [s. n.], 2009.
10张长,邱保志.LDC-mine——基于局部偏差系数的孤立点挖掘算法[J].计算机应用,2007,27(1):95-97. 被引量：3

二级参考文献43

1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.
8HanJiawei KamberM.Data Mining Concept and Technique[M].北京:高等教育出版社,2001..
9Witten Ian H, Frank Eibe. Data mining:practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, 1999
10Collections of datasets. http://www. cs. waikato. ac. nz/ml/weka/

共引文献126

1李存华,孙志挥,陈耿,胡云.核密度估计及其在聚类算法构造中的应用[J].计算机研究与发展,2004,41(10):1712-1719. 被引量：64
2ZHANG Jing 1,2 , SUN Zhi-hui 1 1.Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China,2.Department of Electricity and Information Engineering, Jiangsu University, Zhenjiang 212001, Jiangsu, China.Constructing Three-Dimension Space Graph for Outlier Detection Algorithms in Data Mining[J].Wuhan University Journal of Natural Sciences,2004,9(5):585-589. 被引量：1
3肖冰,邓飞其.一种对电子商店中孤立点进行跟踪的算法[J].河南科技大学学报（自然科学版）,2005,26(4):41-43.
4张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报（自然科学版）,2005,35(6):863-866. 被引量：6
5杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法[J].软件学报,2006,17(8):1796-1803. 被引量：22
6周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量：21
7金义富,朱庆生,邢永康.一种基于关键域子空间的离群数据聚类算法[J].计算机研究与发展,2007,44(4):651-659. 被引量：8
8孙云,李舟军,陈火旺.孤立点检测算法及其在数据流挖掘中的可用性[J].计算机科学,2007,34(10):200-203. 被引量：15
9李存华.l_∞度量意义下的离群点检测[J].淮海工学院学报（自然科学版）,2008,17(2):27-30.
10倪巍伟,陈耿,陆介平,吴英杰,孙志挥.基于局部信息熵的加权子空间离群点检测算法[J].计算机研究与发展,2008,45(7):1189-1194. 被引量：28

同被引文献39

1陆声链,林士敏.基于距离的孤立点检测及其应用[J].计算机与数字工程,2004,32(5):94-97. 被引量：23
2焦誉,傅为忠.基于距离的孤立点挖掘在CRM上的应用[J].华东经济管理,2007,21(6):67-69. 被引量：2
3薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量：96
4余伟峰,钱夕元.基于KNN图的两阶段孤立点检测及应用研究[J].计算机工程与应用,2008,44(2):186-189. 被引量：1
5Aggarwal C C,Yu P S.Outlier detection for high dimensionaldata[C].Proc of ACM International ConferenceManagement of Data.New York,USA:ACM Press,2001.
6Ester M,Kriegel H P,Sander J,et al.A density-basedalgorithm for discovering clusters in large spatial databaseswith noise[C].Proc 2nd Int Conf on Knowledge Discoveryand Data Mining(KDD-96).Portland:ACM Press,1996:226-231.
7Daszykowski M,Walczak B,Massart D L.Looking fornatural patterns in data[J].Chemometrics and Intelligent Laboratory Systems,2001,56(2):83-92.
8Hawkins D.Identification of outliers[M].London:Chapmanand Hall,1980.
9Knorr E M,Ng R T,Tucakov V.Distance-based outliers:algorithms and applications[J].VLDB Journal:Very LargeDatabases,2000:237-253.
10Ramaswamy S,Rastogi R,Shim K.Efficient algorithmsfor mining outliers from large data sets[C].Proceedingsof the ACM SIGMOD Conference,2000:437-438.

引证文献4

1陈鹏,胡啸峰,林艳.孤立点挖掘在警情时间序列异常点识别中的应用[J].科学技术与工程,2015,35(7):225-228. 被引量：3
2顾洪博,张继怀.基于偏离度的孤立点检测算法在聚类分析的应用[J].佳木斯大学学报（自然科学版）,2018,36(4):547-549. 被引量：1
3任建华,高立明.基于聚类的两段式孤立点检测算法[J].计算机工程与应用,2016,52(20):98-102. 被引量：8
4廖康明,王昊.基于改进序列概率比检验方法的异常数据检测[J].西南师范大学学报（自然科学版）,2018,43(1):75-81. 被引量：1

二级引证文献13

1陈丁,赵军,吴春旺.互联网中混合入侵信息节点定位识别仿真[J].计算机仿真,2017,34(7):195-198. 被引量：4
2胡洋,张娅妮.大型数据库中数据流异常路径检测仿真[J].计算机仿真,2018,35(6):451-455. 被引量：4
3顾洪博,张继怀.基于偏离度的孤立点检测算法在聚类分析的应用[J].佳木斯大学学报（自然科学版）,2018,36(4):547-549. 被引量：1
4王祎,韩林生,高艳波,李超.基于人工神经网络的海洋锚系浮标表层水温序列异常检测研究[J].海洋技术学报,2018,37(5):23-27. 被引量：1
5闫春,解冰心.考虑离群值的对数正态模型及其预测分布的蒙特卡洛法实现[J].统计与信息论坛,2018,33(9):17-22.
6刘冬冬.基于密度异常因子的武器装备故障检测方法[J].舰船电子工程,2019,39(5):120-123. 被引量：1
7冯宇,苑易伟.基于最小超球面密度的孤立点检测算法[J].计算机技术与发展,2019,29(6):32-36.
8黄强,叶青,聂斌,李欢.离群点识别方法研究[J].软件导刊,2019,18(6):35-41. 被引量：2
9解初,王建东,韩邦磊,王振.基于趋势特征聚类的多元相似时间序列的提取[J].科学技术与工程,2020,20(7):2786-2793. 被引量：8
10石少冲,陈鹏,曾昭龙,胡校成.基于时间序列分解与全连接神经网络的警情长周期时间序列预测[J].科学技术与工程,2020,20(13):5186-5191. 被引量：7

1Xiao LIU.The Curious Case of a Robot Doctor：＂Human,＂ Labor and Expert Systems[J].Frontiers of Literary Studies in China-Selected Publications from Chinese Universities,2016,10(4):646-673.
2闫继宏,赵杰,边信黔,蔡鹤皋.Multi-telerobot collaboration based on coordinated controller[J].Journal of Harbin Institute of Technology(New Series),2006,13(1):106-112.
3Liana De Girolami Cheney.The Labors of the Months and the Zodiac Signs in the Cathedral of Otranto： An Iconographical Interpretation of the Symbols of Labor and Time[J].Cultural and Religious Studies,2016,4(11):682-709. 被引量：1
4何中市,卢建云,余磊.基于多通道Gabor滤波与CS-LBP的人脸识别方法[J].计算机科学,2010,37(5):261-264. 被引量：6
5郭金兴,王庆芳.Estimates of China’s Rural Surplus Labor and Its Structure from 2002 to 2010[J].China Economist,2015,10(3):49-62.
6柯佳,程显毅,李晓薇.基于用户兴趣反馈的智能合作过滤模型的研究[J].计算机工程与设计,2007,28(7):1659-1662. 被引量：1
7晓光.任性的大学[J].大学生,2015,0(6):25-25.
8翁志娟.陆扬:陆扬地狱[J].艺术界,2010(6):224-225.
9怀疑、实验、发现[J].实验与分析,2009(3):24-24.
10张新良,石纯一.多Agent合作求解[J].计算机科学,2003,30(8):100-103. 被引量：4

计算机工程

2013年第11期

浏览历史

内容加载中请稍等...

基于相似孤立系数的孤立点检测算法被引量：4

参考文献79

二级参考文献43

共引文献126

同被引文献39

引证文献4

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

基于相似孤立系数的孤立点检测算法 被引量：4

参考文献79

二级参考文献43

共引文献126

同被引文献39

引证文献4

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

基于相似孤立系数的孤立点检测算法被引量：4