期刊文献+

一种基于邻近性和团的异常数据检测算法 被引量:3

An Outlier Detection Algorithm Based on Proximity Cliques
下载PDF
导出
摘要 随着数据科学研究的不断深入,异常数据对数据分析工作的干扰也越来也大,如何有效检测异常数据已成为数据研究的关键问题之一。目前传统基于距离的方法仅考虑单个对象的异常性,缺少对正常对象之间如何抱团的分析,针对此问题,论文提出了一种基于邻近性(Proximity)和团(Clique)的异常检测算法——PCOD(Proximity Cliques Outlier Detec⁃tion)算法。该算法引入了图论中团的概念,通过团来解释正常对象之间的连接,根据数据对象间的连接性来分析数据点是否为异常点。PCOD算法主要包括两个步骤:首先,根据数据对象之间的邻近性,将数据中各个对象表示为存在边的无向图;再递归搜索图获取所有团集合,对所有的团进行分析并检测出没有抱团的异常点。最后,使用Arrhythmia、Pima、Vowel等UCI数据集进行实验,实验结果表明PCOD算法在精确率方面优于同类异常检测算法。 With the continuous deepening of data science research,the interference of Outlier data on data analysis is also in⁃creasing.How to effectively detect Outlier has become one of the key issues in data research.The current traditional distance-based method only considers the anomaly of a single object,and does not consider the possibility of normal objects grouping together,in response to this problem,this paper proposes a method based on proximity and clique outlier detection algorithm-PCOD(Proximity Cliques Outlier Detection)algorithm.The algorithm introduces the concept of clique in graph theory,and explains the connection between normal objects through cliques in the data,and points that do not clump with other objects are outliers.The PCOD algo⁃rithm includes two steps.First,according to the proximity between data objects,each object in the data is represented as an undi⁃rected graph with edges connected.Then the graph recursively is searched to obtain a cluster of cliques,the clique is analyzed and Outlier is detected that do not belong to the clique.Finally,experiments are performed using UCI datasets such as Arrhythmia,Pi⁃ma,Vowel,etc.The experimental results show that the PCOD algorithm is superior to similar detection algorithms in terms of preci⁃sion.
作者 解峰 蔡江辉 杨海峰 荀亚玲 XIE Feng;CAI Jianghui;YANG Haifeng;XUN Yaling(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024)
出处 《计算机与数字工程》 2021年第5期971-976,共6页 Computer & Digital Engineering
基金 国家青年科学基金项目(编号:61602335)资助。
关键词 异常检测 邻近性 稀疏图 团搜索 outlier detection proximity sparse graph search cliques
  • 相关文献

参考文献5

二级参考文献10

  • 1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
  • 3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
  • 4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
  • 5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
  • 6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
  • 7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.
  • 8孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1073
  • 9蔡晓妍,戴冠中,杨黎斌.谱聚类算法综述[J].计算机科学,2008,35(7):14-18. 被引量:188
  • 10梁吉业,白亮,曹付元.基于新的距离度量的K-Modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. 被引量:46

共引文献54

同被引文献38

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部