摘要
在数据挖掘领域中,研究的重点之一为异常检测技术。针对数据挖掘中实际采集到的样本数据受众多因素的影响所出现的异常值,本文提出一种新的基于邻居样本分布特征的异常检测算法,该算法通过引用邻域的概念能够处理混合属性的数据集,并且单纯考虑一个样本的的邻域大小和样本的邻域密度不能合理的反映该样本的异常程度,该算法考虑邻域中的所有样本。仿真实验结果表明,在处理混合数据中本文提出的数据异常检测算法相较其他异常检测算法有明显的优势。
In the field of data mining, one of the key research for anomaly detection. The abnormal value appears in view of the influence of many factors of sample data in the data mining the actual audience of the collected, this paper proposes a new anomaly detection algorithm based on neighbor distribution feature, the algorithm by reference to the concept of mixed neighborhood the properties of the data set,and only consider the neighborhood density cannot be a sample of the size of the neighborhood and the samples reasonably reflect the abnormal degree of the sample, the algorithm considers all the samples in the neighborhood. The simulation results show that the anomaly detection algorithm compared to other anomaly detection algorithm has obvious advantages in processing the mixed data is put forward the data.
作者
张军
刘文杰
Zhang Jun;Liu Wenjie(Jiangsu Maritime Institute,Nanjing 211170,China;School of Computer and Software,Nanjing University of Information Science & Technology,Nanjing 210044,China)
出处
《科技通报》
北大核心
2017年第1期86-88,141,共4页
Bulletin of Science and Technology
基金
2015年江苏省现代教育技术重点研究课题(项目编号:2015-R-42639)
关键词
混合数据
异常检测
异常值
邻居样本
mixed data
anomaly detection
outlier
neighbor sample