摘要
针对传统局部离群点检测算法的局限性进行了研究,提出了一种新的有效的离群数据挖掘算法。该算法在寻找数据点的近邻区域时采用了基于影响空间的局部离群点检测(INFLO)中影响空间的概念,然后在计算数据点的离群因子时,根据基于链接的离群点检测(COF)中链式距离的思想,提出了基于相似k距离邻居序列(SKDNS)的离群因子计算方法。通过对比该算法和其他经典局部离群点检测算法在不同数据分布情况下的挖掘结果,该算法比LOF、INFLO和COF算法的离群挖掘准确性更高,能有效克服LOF算法的不足,提高局部离群数据挖掘的准确性和多样性。
Studying on the limitation of traditional local outliers mining algorithm,this paper proposed a novel and effective algorithm.The algorithm used the concept of influenced space in influenced outlierness based algorithm (INFLO) to find the neighborhood for every object.And according to the thoughts of chaining distance in connectivity based outlier factor(COF),it proposed the concept of similar k_distance neighbor series (SKDNS) to compute the outlier factor.Comparing the outliers mining results of the algorithm and other local outliers mining algorithms in different data distribution,it can detect the outliers more accurately,verifying that the algorithm can overcome the shortcomings of LOF efficiently and improve the effectiveness and diversity of local outliers mining.
出处
《计算机应用研究》
CSCD
北大核心
2014年第6期1693-1696,1701,共5页
Application Research of Computers
关键词
离群数据挖掘
影响空间
链式距离
相似k距离邻居序列
离群因子
outliers detection
influenced space
chaining distance
similar k_distance neighbor series
outlier factor