摘要
针对现有的离群点检测算法对于不规则形状数据集和复杂分布的多维数据集检测精度较低的问题,提出了一种基于相似度剪枝的离群点检测算法.算法首先通过构造相似度矩阵的方法,计算样本点之间的相似度,通过度矩阵获取与其他样本相似度较小的样本作为离群点候选集,完成对非离群点的剪枝;然后,通过LOF算法计算离群点候选集中所有对象的局部离群因子,根据局部离群因子的大小进行判断得到最终的离群点.实验结果表明,所提出的算法可以得到较高的离群点检测精确度.
In order to solve the problem that the existing outlier detection algorithms have low detection accuracy for irregular shape data sets and complex distributed multidimensional data sets, an outlier detection algorithm based on similarity pruning is proposed. Firstly, the similarity matrix construction method is used to calculate the similarity between sample points. A part of the sample points with smaller similarity to other samples is found as the outlier candidate set by the degree matrix. And then the LOF algorithm is used to calculate the local outlier factor of all the objects in the outlier candidate set,and the final outliers are obtained according to the nu- merical value of the local outlier factor. The experimental results show that the proposed algorithm can obtain high outlier detection ac- curacy.
作者
丁天一
张旻
方胜良
DING Tian-yi;ZHANG Min;FANG Sheng-liang(Electronic Engineedng Institute,Hefei 230037,Chin)
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第8期1680-1684,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61171170)资助
安徽省自然科学基金项目(1408085QF115)资助
国防科技重点实验室基金项目(9140C130502140C13068)资助
总装预研基金项目(9140A22020315JB39001)资助.
关键词
离群点检测
局部离群因子
相似度矩阵
剪枝
outlier detection
local outlier factor
similarity matrix
pruning