摘要
利用逆k近邻计数与k近邻距离均值相结合的方式,给出了一种无监督离群数据挖掘算法.该算法以k近邻对象集合、k近邻对象距离作为前提条件,首先计算数据集中对象的逆k近邻计数,求得每个对象的antihub分数;其次,根据k近邻距离得到每个对象KNN的antihub分数和权值,将权值大于等于1的对象保存在离群对象候选集List中;然后根据antihub分数以及k近邻距离均值,重新定义了离群分数公式,选取离群分数最大的若干个对象作为离群对象;最后,采用人工数据集和UCI标准数据集,实验验证了该算法的有效性.
In this paper,an unsupervised outlier data mining algorithm is proposed by combining reverse k nearest neighbor counting with k nearest neighbor distance mean. In this algorithm,k nearest neighbor objects set and distance of k nearest neighbor objects are taken as the precondition,and reverse k nearest neighbor counts of all objects in the dataset are first calculated,and antihub fraction of each object is obtained. Secondly,according to the k nearest neighbor distance,the antihub score and weight of the KNN of each object are obtained,and the objects whose weight is greater than or equal to 1 are saved in the outlier candidate set List. Then according to the antihub score and the k nearest neighbor distance mean,the outlier fraction formula is redefined,and some objects with the highest outlier score are selected as outliers. In the end,artificial data sets and UCI data sets are used to validate the effectiveness of the algorithm.
作者
朱云丽
张继福
ZHU Yun-li;ZHANG Ji-fu(School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第8期1627-1632,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61572343)资助
关键词
离群挖掘
逆k近邻
k近邻距离
权值剪枝
antihub分数
outlier mining
reverse k nearest neighbor
k nearest neighbor distance
weight pruning
antihub score