摘要
为了提高大数据背景下离群点检测方法的准确性和时效性,深入研究并分析了聚类算法的特征,提出了一种基于网格局部异常因子(LOF)算法和自适应K-means算法的改进型离群点检测聚类算法。先对大数据信息使用网格LOF算法进行预处理,过滤掉数据中孤立的离群点,再用自适应K-means算法精确地进行离群点检测。最后,试验结果表明,该算法相比于同类离群点检测算法节约了检测运行时间,并提高了检测准确度,对大数据集和高维数据也有较理想的离群点检测效果。
To improve the accuracy and timeliness of outlier detection methods under big data background,the characteristics of clustering algorithm is studied and analyzed,and an improved outlier detection clustering algorithm based on grid local outlier factor(LOF)and adaptive K-means is proposed.Firstly,the big data information is preprocessed with the grid LOF algorithm,and the isolated outliers in the data are filtered out.Then,the outliers are accurately detected with the adaptive K-means algorithm.Finally,experimental results show that compared with the similar outlier detection algorithms,the algorithm can save the detection time and improve the detection accuracy,and achieve ideal outlier detection effect for the big data sets and the high dimensional data.
作者
张硕
金鑫
李兆峰
高建
ZHANG Shuo;JIN Xin;LI Zhaofeng;GAO Jian(The 28th Research Institute of China Electronics Technology Group Corporation, Nanjing 210007, China)
出处
《指挥信息系统与技术》
2019年第1期90-94,共5页
Command Information System and Technology