期刊文献+

基于kNN-DBSCAN的缺失数据填补优化算法 被引量:4

Optimization Algorithm for Missing Data Filling Based on kNN-DBSCAN
下载PDF
导出
摘要 针对传统的kNN(k-NearestNeighbor)近邻填补算法对缺失数据的填补效果会因为k最近邻数据存在噪声受到较大干扰的问题,提出一种基于kNN-DBSCAN(k-NearestNeighbor Density-based Spatial Clustering of Applications with Noise)的缺失数据填补优化算法。将基于密度的DBSCAN聚类算法运用到kNN近邻填补算法中,先用kNN算法得到目标填补数据的原始k最近邻数据集,运用DBSCAN聚类算法对原始k最近邻数据集进行噪声检测并消除噪声数据,得到当前k最近邻数据集,最后并入kNN计算,填补目标缺失数据;同时,针对DBSCAN聚类算法参数设置敏感的问题,通过分析数据集的统计特性来确定参数,避免人为经验判断。最后利用真实数据对算法进行验证,结果显示该算法对目标缺失数据的填补准确度要优于传统的kNN算法。 In view of the effect of traditional KNN nearest neighbor filling algorithm on missing data because the noise of k nearest neighbor data is greatly disturbed,a new algorithm based on KNN-DBSCAN is proposed in this paper.The density-based dbscan clustering algorithm is applied to the kNN nearest neighbor filling algorithm.The original k nearest neighbor dataset of the target filling data is first obtained by the kNN algorithm.The DBSCAN clustering algorithm is used for noise detection and noise elimination of the original k nearest neighbor dataset.The current k nearest neighbor dataset is finally incorporated into the knn calculation to fill the target missing data.at the same time,the sensitive problem of setting parameters for the dbscan clustering algorithm is used to determine the parameters and avoid empirical artificial judgment by analyzing the statistical characteristics of the data set.Finally,using the real data to verify the algorithm,the results show that the accuracy of the algorithm is better than that of the traditional KNN algorithm.
出处 《工业控制计算机》 2020年第4期58-60,63,共4页 Industrial Control Computer
基金 中央高校基本科研业务费项目(ZXH2012D012)。
关键词 kNN填补 数据缺失 噪声检测 DBSCAN聚类 欧式距离 kNN filling missing data noise detection DBSCAN clustering
  • 相关文献

参考文献5

二级参考文献41

  • 1Salman Ahmed Shaikh,Hiroyuki Kitagawa.Top-k Outlier Detection from Uncertain Data[J].International Journal of Automation and computing,2014,11(2):128-142. 被引量:1
  • 2张国英,沙芸,江慧娜.基于粒子群优化的快速KNN分类算法[J].山东大学学报(理学版),2006,41(3):120-123. 被引量:8
  • 3刘星毅,农国才.几种不同缺失值填充方法的比较[J].南宁师范高等专科学校学报,2007,24(3):148-150. 被引量:8
  • 4Little R,Rubin D.Statistical analysis with missing data[ M].2nd ed.New York:John Wiley and Sons,2002.
  • 5Huang C C,Lee H M.A grey-based nearest neighbor approach for miss-ing attribute value prediction [ J ].Applied Intelligence,2004,20(3):239-252.
  • 6Lakshminarayan K,Harp S A,Samad T.Imputation of missing data in industrial databases [ J ].Applied Intelligence,1999,11(3):259-275.
  • 7Han J,Kamber M.Data mining concepts and techniques [ M ].2nd ed.San Francisco:Morgan Katffmann Publishers,2006.
  • 8Han Jiawei,K Micheline. Data Mining:Concepts and Techniques[M].San Francisco,ca:morgan Kaufmann Publishers,2006.
  • 9E Knorr,R Ng,V Tucakov. Distance-based Outliers:Algorithms and Applications[J].VLDB Journal:Very Large Data Bases,2002,(3/4):237-253.
  • 10M Breunig. LOF:Identifying Density-based Local Outliers[A].Dallas,2000.93-104.

共引文献67

同被引文献29

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部