期刊文献+

缺失森林算法在缺失值填补中的应用 被引量:11

Application of MissForest Algorithm for Imputing Missing Data
下载PDF
导出
摘要 目的介绍R环境下缺失森林算法在缺失值填补中的应用并评价其填补效果。方法通过实际数据阐述填补估算流程,比较缺失森林算法与直接删除法处理缺失数据的效果。结果当数据缺失率为10%时,缺失森林算法填补的效果明显优于删除法;当数据缺失率在20%时,两种方法处理缺失值的效果都不太理想,效果相近。当缺失率达50%时,3种类型的变量估算的误差已经较大,两种方法的估算效果均欠佳。结论缺失森林算法在软件操作上简便,并且对数据结构和分布的要求宽松,可充分利用现有记录的信息,能较为准确地反应调查的真实情况,在实际工作中具有较好的应用价值。 Objective To introduce the principle of missForest algorithm and its basic R procedure in imputing missing data,and to assess the imputation effects of missForest. Methods Based on real data sets with missing variables and different missing rate,we introduce R procedure of missForest and compare the imputation results between missForest and deletion method. Results MissForest outperforms deletion method as missing rate is 10%. As the missing rate is increasing by 20%,there are no obvious differences for these methods and the imputation effects of these methods dealing with missing data are unsatisfactory. Whereas missing rate is more than 50%,the relative error of three kinds of variables for these methods is increasing dramatically,neither method is appropriate. Conclusion MissForest is more attractive than other multiple imputation methods for its easy and simple usage in software,moreover it does not require assumptions about the distribution and structure of the data. With this new method,we can make the most use of the data in hand and have more reliable results,so it is worth using widely in practice.
出处 《中国卫生统计》 CSCD 北大核心 2014年第5期774-776,共3页 Chinese Journal of Health Statistics
关键词 缺失森林 随机森林 决策树 缺失值 Miss Forest Random forest Decision tree Missing data
  • 相关文献

参考文献9

  • 1方匡南,吴见彬,朱建平,谢邦昌.随机森林方法研究综述[J].统计与信息论坛,2011,26(3):32-38. 被引量:651
  • 2武晓岩,李康.随机森林方法在基因表达数据分析中的应用及研究进展[J].中国卫生统计,2009,26(4):437-440. 被引量:28
  • 3李贞子,张涛,武晓岩,李康.随机森林回归分析及在代谢调控关系研究中的应用[J].中国卫生统计,2012,29(2):158-160. 被引量:32
  • 4Stekhoven DJ, Buhlmann P. MissForest-non-parameWic missing value imputation for mixed-type data. Bioinformatics ,2012,28 ( 1 ) : 112-118.
  • 5Oba S, Sato M, Takemasa I, et al. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 2003 (19): 2088-2096.
  • 6Karahalios A, Baglietto L, Carlin, et al. A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures. B MC Medical Research Methodology, 2012,12:96.
  • 7Enders CK. Applied Missing Data Analysis. New York: The Guilford Press ,2010 : 37-54.
  • 8Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. New York:Wiley ,2002:59-74.
  • 9Buuren SV, Oudshoom K. MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software ,2010,7 ( 16 ) : 1 -68.

二级参考文献59

  • 1刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报(自然科学版),2008,36(S1):134-139. 被引量:8
  • 2武晓岩,李康.基因表达数据判别分析的随机森林方法[J].中国卫生统计,2006,23(6):491-494. 被引量:21
  • 3林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报(自然科学版),2007,46(2):199-203. 被引量:36
  • 4武晓岩,闫晓光,李康.基因表达数据的随机森林逐步判别分析方法[J].中国卫生统计,2007,24(2):151-154. 被引量:14
  • 5Breiman L. Random Forests. Statistics Department University of California Berkeley, CA 94720, January,2001.
  • 6Sander O, Sommer I, Lengauer T. Local protein structure prediction using discriminative models. BMC Bioinformatics,2006,7:14.
  • 7Bao L,Cui Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary informarion. Bioinformatics,2005,21 : 2185 -2190.
  • 8Jiang HY, Deng YP, Chen HS, et al. Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics ,2004,5 : 81.
  • 9Zhang HP, Yu CY, Singer B. Cell and tumor classification using gene expression data: Construction of forests. Proe Natl Acad Sci USA, 2003,100:4168-4172.
  • 10Lunetta KL, Hayward LB, Segal J, et al. Screening large-scale association study data:exploiting interactions using random forests. BMC Genet,2004,5:32.

共引文献693

同被引文献63

引证文献11

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部