期刊文献+

缺失数据插补方法性能比较分析 被引量:14

Comparative Analysis of the Performance of Interpolation Methods for Missing Data
下载PDF
导出
摘要 数据缺失问题在现实工作生活中不可避免,为保证信息完整度以便于后续统计分析,尽可能准确地预测填补缺失值则显得尤为重要。基于两组分别服从于高斯分布和伽马分布的模拟数据集和一组非洲地区部分国家预期寿命实际数据,分别预设5%、10%和20%三种缺失比例,利用计算机软件对四种插补方法统计结果进行比较分析。试验结果表明,模拟数据中自回归建模插补和均值插补整体效果略优于最近邻插补和线性回归插补;实际数据中当缺失数据比例较低时,最近邻插补和线性回归插补效果优于前两者,当缺失比例较高时与模拟数据效果无明显差异。 Data missing is inevitable.In order to ensure information integrity and follow-up statistical analysis,it is particularly important to predict and fill in missing values as accurately as possible.Based on two sets of simulated data sets that are subject to Gaussian distribution and Gamma distribution respectively,and a set of actual life expectancy data of some countries in Africa,three missing ratios of 5%,10% and 20% are preset respectively,and the statistical results of the four interpolation methods are compared and analyzed by computer software.The experimental results show that the overall effect of auto-regression modeling interpolation and mean interpolation in simulated data is slightly better than that of K-nearest neighbor interpolation and linear regression interpolation.In actual data,when the proportion of missing data is low,K-nearest neighbor interpolation and linear regression is better than the former two,and there is no significant difference in the effect of the simulated data when the missing ratio is high.
作者 徐鸿艳 孙云山 秦琦琳 朱明涛 XU Hongyan;SUN Yunshan;QIN Qilin;ZHU Mingtao(School of Science,Tianjin University of Commerce,Tianjin 300134,China;School of Information Engineering,Tianjin University of Commerce,Tianjin 300134,China)
出处 《软件工程》 2021年第11期11-14,10,共5页 Software Engineering
关键词 缺失数据 插补方法 自回归建模 missing data interpolation method autoregressive
  • 相关文献

参考文献5

二级参考文献34

  • 1田兵.缺失数据的单一插补方法[J].阴山学刊(自然科学版),2011,25(3):17-19. 被引量:3
  • 2张宏亭,李学仁,孔韬.BP神经网络在缺失数据估计中的应用[J].计算机工程与设计,2007,28(14):3457-3459. 被引量:13
  • 3[1]Patrick O'Neil,Elizabeth O'Neil.DATABASE Principles,Programming,and Performance[M].北京:高等教育出版社,2002.
  • 4[2]萨师煊,王珊.数据库系统概论[M].北京:高等教育出版社,2005:169-192.
  • 5Julie M D, Kannan B. Attribute Rreduction and Missing Value Imput- ing With ANN:Prediction of Learning Disabilities[J].Neural Comput & Applic,2012,21(7).
  • 6Wang X C, Liu X D, Pedrycz W. Fuzzy Rule Based Decision Trees[J].Pattern Recognition,2015,(48).
  • 7Sovilj D, Eirola E, Miche Y. Extreme Learning Machine for Missing Data Using Multiple Imputations[J].Neurocomputing,2016,(174).
  • 8Lee M C, Mitra R. Multiply imputing Missing Values in Data Sets With Mixed Measurement Scales Using A Sequence of Generalized Linear Models[J].Computational Statistics and Data Analysis,2016, (95).
  • 9Maa L, Destercke S, Wang Y. Online Active Learning of Decision Trees With Evidential Data[J]. Pattern Recognition, 2016, (52).
  • 10Tsang S, Kao B, Yip K Y. Decision Trees for Uncertain Data[J]. IEEE Transactions on Knowledge and Data Engineering,2011,23(1).

共引文献78

同被引文献156

引证文献14

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部