摘要
针对kNN算法中欧氏距离具有密度相关性敏感的缺点,提出综合马氏距离和灰色分析方法代替kNN算法中欧式距离的新算法,应用到缺失数据填充方面。其中马氏距离能解决密度相关明显的数据集,灰色分析方法能处理密度相关不明显的情况。因此,该算法能很好处理任何数据集,实验结果显示,算法在填充结果上明显优于现有的其他算法。
The Euclidean-based k-Nearest Neighbor (kNN) algorithm is restricted to the dataset without correlationsensitive on density. The author proposed an improved kNN algorithm based on Mahalanobis distance and gray analysis for imputing missing data to replace the existing Euclidean distance. The Mahalanobis distances can deal with the issue of correlation-sensitive on density, and the gray-analysis method can deal with the opposite case. Hence, the proposed method can deal with any kind of datasets, and the experimental results show the proposed method outperforms the existing algorithms.
出处
《计算机应用》
CSCD
北大核心
2009年第9期2502-2504,2536,共4页
journal of Computer Applications
基金
广西自然科学基金资助项目(桂科自0899018)
广西教育厅科研项目(200808MS062)
关键词
数据预处理
缺失数据
最近邻算法
灰色分析
马氏距离
data preprocessing
missing data
Nearest Neighbor (NN) algorithm
gray analysis
Mahalanobis distance