摘要
在数据挖掘以及机器学习等领域,都需要涉及一个数据预处理过程。其中,缺失值的填充是一个非常具有挑战性的任务,因为填充效果的好坏会极大的影响学习算法及挖掘算法的后续处理过程.目前已有的一些填充算法在一定程度上能够处理缺失值问题.与已有的方法不同,提出了一种扩展的基于信息增益的缺失值填充算法,它充分利用数据集中各属性之间隐含的关系对缺失的数据进行填充。大量的实验表明,提出的扩展的基于信息增益的缺失值填充算法是有效的.
In the data mining or machine learning field, a data preproeessing procedure is often neededoWhile the missing data filling is a very challenging task, because the filling results could greatly affect the following procedures of the learning or mining algorithms. While some existing filling algorithms can deal with the missing data problem to some extent. Different from existing methods, an extended information gain (IG) based algorithm is proposed in this paper for dealing with missing data, which fully utilizes the underlying relationships between attributes of the dataset. Extensive experiments show that the proposed algorithm is efficient.
出处
《微计算机信息》
北大核心
2007年第04X期180-181,186,共3页
Control & Automation
关键词
数据挖掘
缺失值填充
信息增益
分类准确率
Data Mining, Missing Data Filling, Information Gain, Classification Accuracy