摘要
随着数据科学研究的不断深入,异常数据对数据分析工作的干扰也越来也大,如何有效检测异常数据已成为数据研究的关键问题之一。目前传统基于距离的方法仅考虑单个对象的异常性,缺少对正常对象之间如何抱团的分析,针对此问题,论文提出了一种基于邻近性(Proximity)和团(Clique)的异常检测算法——PCOD(Proximity Cliques Outlier Detec⁃tion)算法。该算法引入了图论中团的概念,通过团来解释正常对象之间的连接,根据数据对象间的连接性来分析数据点是否为异常点。PCOD算法主要包括两个步骤:首先,根据数据对象之间的邻近性,将数据中各个对象表示为存在边的无向图;再递归搜索图获取所有团集合,对所有的团进行分析并检测出没有抱团的异常点。最后,使用Arrhythmia、Pima、Vowel等UCI数据集进行实验,实验结果表明PCOD算法在精确率方面优于同类异常检测算法。
With the continuous deepening of data science research,the interference of Outlier data on data analysis is also in⁃creasing.How to effectively detect Outlier has become one of the key issues in data research.The current traditional distance-based method only considers the anomaly of a single object,and does not consider the possibility of normal objects grouping together,in response to this problem,this paper proposes a method based on proximity and clique outlier detection algorithm-PCOD(Proximity Cliques Outlier Detection)algorithm.The algorithm introduces the concept of clique in graph theory,and explains the connection between normal objects through cliques in the data,and points that do not clump with other objects are outliers.The PCOD algo⁃rithm includes two steps.First,according to the proximity between data objects,each object in the data is represented as an undi⁃rected graph with edges connected.Then the graph recursively is searched to obtain a cluster of cliques,the clique is analyzed and Outlier is detected that do not belong to the clique.Finally,experiments are performed using UCI datasets such as Arrhythmia,Pi⁃ma,Vowel,etc.The experimental results show that the PCOD algorithm is superior to similar detection algorithms in terms of preci⁃sion.
作者
解峰
蔡江辉
杨海峰
荀亚玲
XIE Feng;CAI Jianghui;YANG Haifeng;XUN Yaling(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024)
出处
《计算机与数字工程》
2021年第5期971-976,共6页
Computer & Digital Engineering
基金
国家青年科学基金项目(编号:61602335)资助。
关键词
异常检测
邻近性
稀疏图
团搜索
outlier detection
proximity
sparse graph
search cliques