摘要
提出一种基于密度的快速查找离群点的算法——基于方形邻域的离群点查找算法(ODBSN),该算法把DBSCAN算法的邻域改造成方形邻域,并吸收基于网格算法的思想,用密集的方形邻域快速排除非离群点;用邻域扩张的思想代替网格划分克服了基于网格算法中“维灾”缺点;同时用局部偏离指数指示离群点的偏离程度,又具有识别精度高和偏离程度可度量的优点.理论分析表明该算法性能优于著名的基于密度的算法,实验表明,ODBSN算法能在各种形状分布与各种密度的数据中有效地查找离群点,速度明显优于LOF与DBSCAN算法.
A new quick denslty-based approach to detect outliers, called outlier detecting based on square neighborhood (ODBSN), is presented. This algorithm changes the t-neighborhood in DBSCAN to a square neighborhood and judges if the neighbors in the dense square neighborhood are not outlier. The algorithm partitions objects with square neighborhood, not with spatial grids, and thus does not cause "dimension curse". The algorithm ean indicate the degree of outlier with the loeal deviate factor, so the outlier can be identified exactly and the precision is measurable. Theoretical comparison shows that this method is more efficient than the well-known algorithm based on density, DBSCAN and LOF. Experimental results more efficient that the proposed approach can effectively identify outliers in databases within clusters that have different shape and varied density, and it is several times faster than the original DBSCAN and LOF algorithm.
出处
《控制与决策》
EI
CSCD
北大核心
2006年第5期541-545,554,共6页
Control and Decision
基金
国家自然科学基金项目(49971063)
国家"863"海洋监测主题子课题基金项目(2001AA633010-04)
江苏省自然科学基金项目(BK2001045)
关键词
数据挖掘
离群点
方形邻域
Data mining
Outliers
Square neighborhood