摘要
文章介绍了数据缺失的现象、概念、出现的领域以及产生的原因,总结了数据缺失机制和数据缺失模式;综述了目前缺失数据常用的处理方法:加权法、删除法、统计学插补法以及机器学习插补法,并综合比较了各种方法的适用范围和优缺点;最后提出了针对高维数据的缺失处理、复合缺失数据特征的处理、新领域的缺失数据处理将是未来缺失数据处理方法的研究方向。
This paper introduces the phenomenon,concept,fields and causes of data missing,summarizes the mechanism of data missing and the mode of data missing.And then the paper reviews the commonly used methods of current missing data processing,such as weighting method,deleting method,statistical interpolation method and machine learning interpolation method,and also comprehensively compares the application scope,advantages and disadvantages of these methods.Finally,the paper proposes that the processing of high-dimensional data missing,the processing of compound missing data characteristics and the missing data processing of new fields will be the research direction of missing data processing methods in the future.
作者
邓建新
单路宝
贺德强
唐锐
Deng Jianxin;Shan Lubao;He Deqiang;Tang Rui(Guangxi Key Lab of Manufacturing System&Advanced Manufacturing Technology,Guangxi University,Nanning 530003,China;School of Mechanical Engineering,Guangxi University,Nanning 530003,China)
出处
《统计与决策》
CSSCI
北大核心
2019年第23期28-34,共7页
Statistics & Decision
基金
国家自然科学基金资助项目(71562002
51965006)
广西自然科学基金资助项目(2018GXNSFAA050111)
广西制造系统与先进制造技术重点实验室项目(16-380-12S011
17-259-05S006)
广西高等教育本科教学改革工程项目(2017JGA126)
广西研究生教育创新计划项目(YCSW2019035)
关键词
缺失数据
处理方法
单一插补
多重插补
方法比较
missing data
processing methods
single interpolation
multiple interpolation
method comparison