摘要
针对工业生产中过程数据的缺失问题,首次提出了运用多重填补方法处理工业过程的缺失数据。阐述了常用的缺失数据处理方法,指出各方法的优缺点。在此基础上,通过建立回归模型,针对多变量工业数据中缺失值较少和较多时的两种情况,分别用删除含缺失值的个案、简单填补和多重填补(MI)3种方法对数据进行处理,利用处理后的新数据集进行数据挖掘,预测目标变量的值,并对预测结果进行分析比较。实验结果表明,多重填补方法的处理效果最好,为工业数据的缺失值处理提供了有用的策略。
Aimed at the problem of missing data in the process of Industrial production, the use of multiple imputation(MI) to treat the missing data in the industrial process is presented at the first time.Firstly, the commonly used method in the treatment of missing data is described and the advantages and disadvantages of each method are pointed out.Then, by establish the regression model, the multi-variable industrial data sets with larger and lower missing rates were treated by deleting the cases with missing data, simple imputation and multiple imputation.Then, made the new data sets for data mining and predicted the value of the target variables to compare and analyze the results.The results show that MI makes the best effect in the process of data.It provides a useful strategy for dealing with data sets with missing values in industrial data.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第6期1351-1354,共4页
Computer Engineering and Design
关键词
缺失值
多重填补法
工业过程数据
数据挖掘
回归预测
missing data
multiple imputation
industrial process data
data mining
regression prediction