期刊文献+

基于逆k近邻计数和权值剪枝的离群数据挖掘算法 被引量:10

Outlier Mining Algorithm Based on Reverse K Nearset Neighbor Counting and Weight Pruning
下载PDF
导出
摘要 利用逆k近邻计数与k近邻距离均值相结合的方式,给出了一种无监督离群数据挖掘算法.该算法以k近邻对象集合、k近邻对象距离作为前提条件,首先计算数据集中对象的逆k近邻计数,求得每个对象的antihub分数;其次,根据k近邻距离得到每个对象KNN的antihub分数和权值,将权值大于等于1的对象保存在离群对象候选集List中;然后根据antihub分数以及k近邻距离均值,重新定义了离群分数公式,选取离群分数最大的若干个对象作为离群对象;最后,采用人工数据集和UCI标准数据集,实验验证了该算法的有效性. In this paper,an unsupervised outlier data mining algorithm is proposed by combining reverse k nearest neighbor counting with k nearest neighbor distance mean. In this algorithm,k nearest neighbor objects set and distance of k nearest neighbor objects are taken as the precondition,and reverse k nearest neighbor counts of all objects in the dataset are first calculated,and antihub fraction of each object is obtained. Secondly,according to the k nearest neighbor distance,the antihub score and weight of the KNN of each object are obtained,and the objects whose weight is greater than or equal to 1 are saved in the outlier candidate set List. Then according to the antihub score and the k nearest neighbor distance mean,the outlier fraction formula is redefined,and some objects with the highest outlier score are selected as outliers. In the end,artificial data sets and UCI data sets are used to validate the effectiveness of the algorithm.
作者 朱云丽 张继福 ZHU Yun-li;ZHANG Ji-fu(School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第8期1627-1632,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61572343)资助
关键词 离群挖掘 逆k近邻 k近邻距离 权值剪枝 antihub分数 outlier mining reverse k nearest neighbor k nearest neighbor distance weight pruning antihub score
  • 相关文献

参考文献4

二级参考文献17

  • 1刘中田,李乡儒,吴福朝,赵永恒.基于小波特征的M型星自动识别方法[J].电子学报,2007,35(1):157-160. 被引量:11
  • 2张继福,蔡江辉.面向LAMOST的天体光谱离群数据挖掘系统研究[J].光谱学与光谱分析,2007,27(3):606-609. 被引量:6
  • 3蒋义勇,张继福,张素兰.基于链表结构的概念格渐进式构造[J].计算机工程与应用,2007,43(11):178-180. 被引量:11
  • 4Knorr E M, Ng R T. Algorithms formining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publishers, 1998. 392-403.
  • 5Han J W, Kamber M. Data Mining Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001.
  • 6Barnett V, Lewis T. Outliers in Statistical Data. New York: John Wiley-Sons, 1994.
  • 7Arning A, Agrawal R, Rghavan P. A linear method for deviation detection in large database. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portlan, Oregon: Morgan Kaufmann Publishers. 1996. 164-169.
  • 8Breunig M M, Kriegel H P, Ng R T, Sander J. LOF: identifying density-based local outliers. ACM Special Interest Group on Management of Data Record, 2000, 29(2): 93-104.
  • 9Agarwal C, Yu S. An effective and efficient algorithm for high-dimensional outlier detection. The International Journal on Very Large Data Bases, 2005, 14(2): 211-221.
  • 10Wille R. Restructuring lattice theory: an approach based on hierarchies of concepts. Ordered Sets, 1982, 11(5): 445-470.

共引文献64

同被引文献107

引证文献10

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部