摘要
文章通过多重插补方法对不同缺失率和缺失模式的多变量缺失样本进行插补,研究了多重插补误差与缺失率和缺失模式的依赖关系。结果表明,当缺失率为0~15%时,多重插补误差与缺失率呈线性关系;当缺失率大于15%时,两者呈偏离线性关系。多重插补误差与缺失模式的方差均值比呈正相关性,当方差均值比越大时,误差也越大。
This paper uses multiple imputation method to impute multi-variable missing samples with different missing rates and missing patterns,and then studies the dependence of multiple imputation errors with missing rates and missing patterns.The results show that when the missing rate is 0~15%,the multiple imputation error has a linear relationship with the missing rate,that when the missing rate is greater than 15%,the two deviate from the linear relationship,and that the multiple imputation error is positively correlated with the ratio of variance to mean value of missing pattern,the larger the ratio of variance to mean value,the larger the error.
作者
彭海艳
李意芝
孟利军
Peng Haiyan;Li Yizhi;Meng Lijun(School of Public Administration,Xiangtan University,Xiangtan Hunan 411105,China;School of Physics and Optoelectronics,Xiangtan University,Xiangtan Hunan 411105,China)
出处
《统计与决策》
CSSCI
北大核心
2022年第1期20-24,共5页
Statistics & Decision
基金
国家社会科学基金资助项目(20BTQ098)。
关键词
多变量缺失数据
多重插补
随机森林插补
缺失率
缺失模式
multivariate missing data
multiple imputation
random forest imputation
missing rate
missing pattern