摘要
不完整数据处理是数据挖掘、机器学习等领域中的重要问题,缺失值填充是处理不完整数据的主流方法。当前已有的缺失值填充方法大多运用统计学和机器学习领域的相关技术来分析原始数据中的剩余信息,从而得到较为合理的值来替代缺失部分。缺失值填充大致可以分为单一填充和多重填充,这些填充方法在不同的场景下有着各自的优势。但是,很少有方法能进一步考虑样本空间分布中的邻域信息,并以此对缺失值的填充结果进行修正。鉴于此,本文提出了一种可广泛应用于诸多现有填充方法的框架用以提升现有方法的填充效果,该框架由预填充、空间邻域信息挖掘和修正填充三部分构成。本文对7种填充方法在8个UCI数据集上进行了实验,实验结果验证了本文所提框架的有效性和鲁棒性。
Incomplete data processing is one of the most active avenues in the fields of data mining,machine learning,etc.Missing value imputation is the mainstream method used to deal with incomplete data.At present,most existing missing value imputation methods utilize relevant techniques in the field of statistics and machine learning to analyze surplus information from original data to replace the missing attributes with plausible values.Missing value imputation can be roughly divided into single imputation and multiple imputation,which have their own advantages in different scenarios.However,there are few methods that can further consider neighborhood information in the spatial distribu-tion of samples and modify the filling results of missing values.In view of this,this paper proposes a new framework that can be widely used in many existing imputation methods to enhance the imputation effect of existing methods.It is composed of three modules,called pre-filling,spatial neighborhood information mining,and modification of the results of pre-filling separately.In this paper,seven existing imputation methods were evaluated on eight UCI datasets.Experi-mental results verified the validity and robustness of the framework proposed in this paper.
作者
严远亭
吴亚亚
赵姝
张燕平
YAN Yuanting;WU Yaya;ZHAO Shu;ZHANG Yanping(School of Computer Science and Technology,Anhui University,Hefei 230601,China)
出处
《智能系统学报》
CSCD
北大核心
2019年第6期1225-1232,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61806002,61872002,61673020,61876001,61602003)
安徽省自然科学基金项目(1708085QF143,1808085MF197)
安徽大学博士科研启动基金项目(J01003253)
关键词
不完整数据
缺失值填充
邻域信息
数据挖掘
机器学习
填充方法
单一填充
多重填充
incomplete data
missing value imputation
neighborhood information
data-mining
machine learning
im-putation method
single imputation
multiple imputation