摘要
针对目前大部分离群点检测算法未考虑数据的局部信息,导致离群点检测的准确率低问题,提出一种新的基于聚类和局部信息的两阶段离群点检测算法.通过定义新的局部离群因子作为判断数据对象是否为离群点的衡量标准,改进了传统离群点检测算法的过程.实验结果表明,该算法在保持线性复杂度的同时,能更准确、有效地挖掘出数据集中的离群点.
Most existing outlier detection algorithms ignore local information of data sets, they are of low accuracy. We adopted a two-phase algorithm based on k-means clustering algorithm, defined a new local stray factor as the standard to judge whether data objects are outliers. We also improved the process of detecting outliers and solved the above problem. Experiments show that our algorithm overcomes the shortcomings of existing methods, ensure the algorithm has linear time complexity and is able to find outliers in data sets more accurately and effectively.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2012年第6期1214-1217,共4页
Journal of Jilin University:Science Edition
基金
吉林省科技发展计划重点项目(批准号:20090304)