摘要
无回答在大数据应用中频繁发生。通常,实际数据的无回答率较低,在这样的情况下,采用倾向得分模型对无回答单元与回答单元进行匹配,易导致倾向得分匹配插补法的插补效果显著下降。为此,将合成少数类过采样算法的思想融入到倾向得分匹配插补法中,提出基于少数类过采样的倾向得分匹配插补法。利用统计模拟与实证研究,在不同无回答率、插补重数和误差分布情形下,演示新插补法的统计性质和应用效果。统计模拟显示,新插补法具有明显高于倾向得分匹配插补法的精度,统计性质受无回答率、插补重数和误差分布的影响小。实证结果显示,新插补法在实际数据中具有较好的应用性。基于少数类过采样的倾向得分匹配插补法提供了处理无回答问题的新思路,并具有较好的扩展性。
Non-response often occurs in big data applications.Generally,the non-response rate of actual data is low.For those data,it is easy to cause the degradation of the propensity score matching imputation to matching the non-response and the response units using by the propensity score model.Therefore,incorporate the idea of synthetic minority over-sampling algorithm into the propensity score matching imputation and propose the propensity score matching imputation based on synthetic minority over-sampling technique.Statistical simulation and empirical research demonstrate that the imputation effects and statistical properties of the new imputation approach to consider different non-response rate,imputation multiplicity and error distributions.The simulation results show that using the new imputation approach improves the imputation accuracy of the propensity score matching imputation significantly.The imputed results are robust to the non-response rate,imputation multiplicity and error distribution.Empirical research provides the good applicability of the propensity score matching imputation based on synthetic minority over-sampling technique.The new approach introduces a new solution view for the non-response and it’s expansible.
作者
杨贵军
杜飞
孙玲莉
YANG Gui-jun;DU Fei;SUN Ling-li(School of Statistics,Tianjin University of Finance and Economics,Tianjin 300222,China;CCESR,Tianjin University of Finance and Economics,Tianjin 300222,China)
出处
《统计与信息论坛》
CSSCI
北大核心
2021年第1期3-12,共10页
Journal of Statistics and Information
基金
国家社会科学基金重点项目“基于大数据的人口统计调查方法与应用研究”(20ATJ008)
国家社会科学基金青年项目“轮换样本校准估计方法在中国住户调查中的应用研究”(20CTJ009)
天津市2019年度哲学社会科学规划重点课题“大数据背景下多目标抽样设计的理论和应用”(TJTJ19-001)
国家自然科学基金面上项目“劣者淘汰两阶段自适应临床试验的设计和分析”(11471239)。
关键词
倾向得分匹配插补法
合成少数类过采样算法
无回答率
无回答机制
propensity score matching imputation
synthetic minority over-sampling technique algorithm
non-response rate
non-response mechanism