摘要
缺失填补是机器学习与数据挖掘领域中极富有挑战性的工作。数据源中的缺失值会对学习算法的性能与学习的质量产生较大的负面影响。目前存在的缺失值填补方法还不能满足用户的需要。提出了一种基于灰色系统理论的缺失值填补方法,该方法采用了基于实例学习的非参拟合和灰色理论技术,对缺失数据进行重复填补,直至填补结果收敛或者满足用户的需要。实验结果表明,该方法在填补效果与效率方面都比现有的KNN填补法和普通的均值替代法要好。
Imputing missing values is one of the challenges in data mining and machine learning.Missing values in a dataset can decrease the efficiency of learning algorithm and negatively affect the algorithm.Existing imputation methods for missing values can not fully satisfy the users' increasing requirements.In this paper,a novel nonparametric algorithm is proposed by using the gray system theory.In this algorithm,missing values are imputed iteratively until the algorithm converges or the output matches to the users' requirement.Experiments with the UCI dataset demonstrate that our method performs better than many existing algorithms such as the KNN algorithm and the mean method in terms of imputation efficiency.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第15期169-172,共4页
Computer Engineering and Applications
基金
广西区教育厅科学研究项目(No.2008CB317100)
973计划前期研究专项(No.200807MS176)
关键词
重复填补
缺失值
灰色关联分析
multiple imputation
missing values
grey relational analysis