
基于倾向得分匹配的缺失数据插补方法 被引量:2

The Missing Data Imputation Method Based on Propensity Score Matching
摘要 针对预测均值匹配中相近性刻画较为单一的问题,考虑多种相近性刻画方法,同时结合倾向得分可将多个协变量降维的特点,提出采用倾向得分匹配来对缺失数据进行插补的新方法:首先估计倾向得分,然后可选择最近邻、卡钳与半径、分层或区间等多种匹配方法进行匹配,最后利用匹配单元的目标变量来对数据缺失单元进行插补.进一步采用蒙特卡罗模拟和实际数据证实方法是有效的,且在均值插补、回归插补、随机插补、最近邻倾向得分匹配插补、卡钳与半径倾向得分匹配插补、分层或区间倾向得分匹配插补方法中分层或区间倾向得分匹配插补效果最好. Since the measure of proximity in predictive mean matching was single, through considering various measure methods of proximity and the characteristic that propensity score could be used to reduce the dimension of multiple covariates, a new method to impute missing data using propensity score matching was proposed. The first step was estimating the propensity score, then choosing many matching methods like nearest neighbor, caliper and radius, stratification or interval, and finally imputing missing data using the target variable of the matching unit. Moreover, the results of a Monte Carlo simulation and a real dataset confirm that the proposed method is effective, and the results of stratification or interval propensity score matching are best among mean imputation, regression imputation, random imputation, nearest neighbor propensity score matching imputation, caliper and radius propensity score matching imputation as well as stratification or interval propensity score matching imputation.
出处 《数学的实践与认识》 北大核心 2016年第12期193-201,共9页 Mathematics in Practice and Theory
基金 国家社会科学基金(15BTJ014) 中国人民大学2015年度拔尖创新人才培育资助计划成果
关键词 倾向得分 匹配 缺失数据 插补 propensity score matching missing data imputation
  • 相关文献


  • 1Ambrosio A D, Aria M, Siciliano R. Accurate tree-based missing data imputation and data fusion within the statistical learning paxadigm[J]. Journal of Classification, 2012, 29(2): 227-258.
  • 2Kim J, Shin M, Chung M, et al.A block-based imputation approach with adaptive LD blocks for fast genotype imputation[J]. BioChip Journal, 2013, 7(1): 63-67.
  • 3Morris T P, White I R, Royston P.Tuning multiple imputation by predictive mean matching and local residual draws[J]. BMC Medical Research Methodology, 2014(14): 75-87.
  • 4Niloofar P, Ganjali M.A new multivariate imputation method based on Bayesian networks[J]. Jour- nal of Applied Statistics, 2014, 41(3):3, 501-518.
  • 5孟杰,李春林.基于随机森林模型的分类数据缺失值插补[J].统计与信息论坛,2014,29(9):86-90. 被引量:27
  • 6于力超,金勇进,王俊.缺失数据插补方法探讨——基于最近邻插补法和关联规则法[J].统计与信息论坛,2015,30(1):35-40. 被引量:21
  • 7Cameron A C, Trivedi P K. Mieroeconometrics:methods and applications[M]. New York:Cambridge University Press, 2008:871.
  • 8Caliendo M, Kopeinig S. Some practical guidance for the implementation of propensity score match- ing[J]. Journal of Economic Surveys, 2008, 2(1): 31-72.
  • 9Rosenbaum P R, Rubin D B.The central role of the propensity score in observational studies for causal effects[J].Biometrika, 1983, 70(1): 41-55.
  • 10Smith J A, Todd P E. Does matching overcome LaLonde' s critique Of nonexperimental estima- tors?[J]. Journal of Econometrics, 2005, 125(2): 305-353.


  • 1Agrawal R, Imielinski T, Swami A. 1Vraning Association Rules between Sets of It:ns in Large Databases[C]. Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, IX], USA, 1993.
  • 2Ragel A, Cremilleux B. Treatment of Missing Values for Association Rules[C]. Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, Lecture Notes in Artificial Intelligence 1394, Berlin: Springer, 1998.
  • 3Ragel A, Crmlleux It MVC-A Reprocessing Method to : with IVEssing Values[J]. Knowledge-Based System Journal,1999, 12 (5/6).
  • 4Shen J J, Chang C C, Li Y C. Combined Association Rules for Dealing with Missing Values[J]. Journal of Information Science, 2007, 33(4).
  • 5Leila Ben Othman, Sadok Ben Yahia. GBARMVC: Generic Basis of Association Rules Based Approach for Missing Values Completion[J]. International Journal of Computing :" Information Sciences, 2011, 9(1) .
  • 6Leila Ben Othman, Sadok Ben Yahia. Yet Another Approach for Completing Missing Values[C]. Springer-Verlag Berlin Heidelberg, CLA 2006, LNAI 4923, 2008.
  • 7Pang-Ning Ta, Michael Steinbach, Vipin Kumar.数据挖掘导论[M].2版.范明,范宏建,等,译.北京:人民邮电出版社,2011.
  • 8李春林,申博.数据挖掘在河北省农村居民总体满意度调查中的应用[J].科技情报开发与经济,2012,22(7):94-97. 被引量:1
  • 9杨贵军,蔡娟,赵晓云.高相关性辅助变量择优回归插补法[J].统计与信息论坛,2012,27(6):8-13. 被引量:6
  • 10方匡南,吴见彬,朱建平,谢邦昌.随机森林方法研究综述[J].统计与信息论坛,2011,26(3):32-38. 被引量:660












使用帮助 返回顶部