摘要
在基于近邻思想的离群点检测算法中,参数k的选择是无法避免的,而k值过大或者过小都会对检测效果产生很大的影响.因此,如何选择k值是近邻方法研究中的重要内容之一.本文提出一种个性化k近邻(Personalized k-Nearest Neighbor,PKNN)的离群点检测方法,其每一个数据点的近邻个数是由算法自动确定,而不需要人为指定.位于稠密区域的点具有更多邻居,而位于稀疏区域的点具有更少的邻居.因此,PKNN方法确定的个性化近邻参数,更符合数据集的直观分布.实验结果表明,与现有方法相比,PKNN算法有很好的离群点检测效果.
For the outlier detection algorithms based on nearest neighbor,it is inevitable to determine the value of parameter k,and the selection of k has a decisive impact on the performance of outlier detection.So how to select the k value is one of the key issues for nearest neighbor methods.This paper proposes an outlier detection method of personalized k-nearest neighbor(PKNN),while the number of neighbors of each sample is automatically determined by the algorithm but not manually assigned.Intuitively,samples in dense regions have more neighbors,and those in sparse regions will have fewer neighbors.Therefore,the personalized nearest neighbor parameter may be more suitable to the distribution of data sets.Experimental results demonstrate that the proposed PKNN has good performance in outlier detection comparing with some state-of-the-art algorithms.
作者
樊瑞宣
姜高霞
王文剑
FAN Rui-xuan;JIANG Gao-xia;WANG Wen-jian(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing,Ministry of Education,Shanxi University,Taiyuan 030006,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2020年第4期752-757,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61673249,U1805263)资助
山西省回国留学人员科研基金项目(2016-004)资助。
关键词
离群点检测
个性化k近邻
参数选择
离群度
outlier detection
personalized k-nearest neighbor
parameter selection
degree of outlier