摘要
针对传统的kNN(k-NearestNeighbor)近邻填补算法对缺失数据的填补效果会因为k最近邻数据存在噪声受到较大干扰的问题,提出一种基于kNN-DBSCAN(k-NearestNeighbor Density-based Spatial Clustering of Applications with Noise)的缺失数据填补优化算法。将基于密度的DBSCAN聚类算法运用到kNN近邻填补算法中,先用kNN算法得到目标填补数据的原始k最近邻数据集,运用DBSCAN聚类算法对原始k最近邻数据集进行噪声检测并消除噪声数据,得到当前k最近邻数据集,最后并入kNN计算,填补目标缺失数据;同时,针对DBSCAN聚类算法参数设置敏感的问题,通过分析数据集的统计特性来确定参数,避免人为经验判断。最后利用真实数据对算法进行验证,结果显示该算法对目标缺失数据的填补准确度要优于传统的kNN算法。
In view of the effect of traditional KNN nearest neighbor filling algorithm on missing data because the noise of k nearest neighbor data is greatly disturbed,a new algorithm based on KNN-DBSCAN is proposed in this paper.The density-based dbscan clustering algorithm is applied to the kNN nearest neighbor filling algorithm.The original k nearest neighbor dataset of the target filling data is first obtained by the kNN algorithm.The DBSCAN clustering algorithm is used for noise detection and noise elimination of the original k nearest neighbor dataset.The current k nearest neighbor dataset is finally incorporated into the knn calculation to fill the target missing data.at the same time,the sensitive problem of setting parameters for the dbscan clustering algorithm is used to determine the parameters and avoid empirical artificial judgment by analyzing the statistical characteristics of the data set.Finally,using the real data to verify the algorithm,the results show that the accuracy of the algorithm is better than that of the traditional KNN algorithm.
出处
《工业控制计算机》
2020年第4期58-60,63,共4页
Industrial Control Computer
基金
中央高校基本科研业务费项目(ZXH2012D012)。