大数据背景下非概率抽样的统计推断问题被引量：35

Statistical Inference Problems of Non-probability Sampling under the Background of Big Data

下载PDF

导出

摘要利用大数据进行抽样,很多情况下由于抽样框的构造比较困难,使得抽取的样本属于非概率样本,传统的抽样推断理论难以应用到非概率样本中,如何解决非概率抽样的统计推断问题,是大数据背景下抽样调查面临的严重挑战。本文提出了解决非概率抽样统计推断问题的基本思路:一是抽样方法,可以考虑基于样本匹配的样本选择、链接跟踪抽样方法等,使得到的非概率样本近似于概率样本,从而可采用概率样本的统计推断理论;二是权数的构造与调整,可以考虑基于伪设计、模型和倾向得分等方法得到类似于概率样本的基础权数;三是估计,可以考虑基于伪设计、模型和贝叶斯的混合概率估计。最后,本文以基于样本匹配的样本选择为例探讨了具体解决方法。 When sampling is done with big data, the construction of sampling frame is difficult in many cases, so that the sample belongs to non-probability sample, and it is difficult to apply the traditional inference theory of sampling to the non-probability sample. Therefore, under the background of big data it is a serious challenge to sampling survey to solve the statistical inference problems of non-probability sampling. The research proposes some basic ideas to solve the statistical inference problems of non-probability sampling. First, sampling methods such as the sample selection method based on sample matching and the method of link-tracing sampling can be considered, so that the obtained non-probability sample approximates to probability sample and then the statistical inference theory of probability sample can be used. Second, the construction and adjustment methods of weights based on pseudo design, models and propensity score can be considered to obtain the base weights similar to probability sample. Third, the estimation methods based on pseudo design, models and Bayesian hybrid probability can be considered. Finally, the sample selection method based on sample matching is taken as an example to discuss concrete solutions to the statistical inference problems of non-probability sampling.

作者金勇进刘展

机构地区中国人民大学应用统计科学研究中心中国人民大学统计学院中国人民大学

出处《统计研究》 CSSCI 北大核心 2016年第3期11-17,共7页 Statistical Research

基金中国人民大学2015年度拔尖创新人才培育资助计划成果

关键词大数据非概率抽样统计推断 Big Data Non-probability Sampling Statistical Inference

分类号 C811 [社会学—统计学]

引文网络
相关文献

参考文献15

1Svensson J. Web panel surveys--can they be designed and used in a scientifically sound way? [ C ]. 59th World Statistics Congress, 2013.
2Rivers D. Sample matching--representative sampling from internet panels[ J]. Polimetrix White Paper Series, 2006.
3Vavreck L, Rivers D. The 2006 cooperative congressional election study [ J]. Journal of Elections, Public Opinion & Parties, 2008,18 (4) :35 -66.
4Baker R, Brick J M, Bates N A, et al. Summary report of the AAPOR task force on nonprobability sampling [ J ]. Journal of Survey Statistics and Methodology, 2013, 1 (2) : 90 -143.
5Terhanian G, Bremer J. A smarter way to select respondents for surveys? [ J]. International Journal of Market Research, 2012, 54 (6) :751 -780.
6Kogan S M, Wejnert C, Chen Y F, et al. Respondent-driven sampling with hard-to-reach emerging adults: an introduction and case study with rural African Americans[ J]. Journal of Adolescent Research, 2011, 26(1 ): 30-60.
7Qiu P Y, Yang Y, Ma X, et al. Respondent-driven sampling to recruit in-country migrant workers in China: a methodological assessment[ J]. Scandinavian Journal of Public Health, 2012, 40 : 92 - 101.
8Elliott M N. Combining data from probability and non-probability samples using pseudo-weights [ J ]. Survey Practice, 2009, 2 (6) : 1 - 7.
9Rosenbaum P R, Rubin D B. The centra| role of the propensity score in observational studies for causal effects [ J ]. Biometrika, 1983, 70(1): 41 -55.
10Lee S. An evaluation of nonresponse and coverage errors in a web panel survey [ J ]. Social Science Computer Review, 2006, 24 (4) : 460 - 475.

二级参考文献36

1刘石柱,冯成玉,阮玉华,巫俊林,周枫,陈康林,何益新,邵一鸣.应答驱动抽样方法在吸毒人群招募中的应用[J].中国公共卫生,2005,21(11):1281-1282. 被引量：5
2李树茁,任义科,费尔德曼,杨绪松.中国农民工的整体社会网络特征分析[J].中国人口科学,2006(3):19-29. 被引量：48
3赵延东,Jon Pedersen.受访者推动抽样:研究隐藏人口的方法与实践[J].社会,2007,27(2):192-205. 被引量：45
4许娟,张洪波,郑迎军,王君,朱军礼,李照荣,朱义彬,胡中旺,张晓鹏.男男性行为者HIV自愿咨询检测需求与利用[J].中国公共卫生,2007,23(9):1040-1042. 被引量：45
5国家统计局.《2013年全国农民工监测调查报告》,http://www.stats.gov.cn/tjsj/zxfb/201405/t20140512-551585.html,2014年10月3日访问.
6艾尔,巴比.2005,《社会研究方法》,邱泽奇,译,北京:华夏出版社.
7国家统计局,2015,《2014年国民经济在新常态下平稳运行》(http://www.stats.gov.coJtjsj/zxfh/201501/d0150120_.671037.html).
8莱文,杰克、詹姆斯·艾伦·福克斯.2008,《社会研究中的基础统计学(第九版)》,王卫东,译,北京:中国人民大学出版社.
9唐纳德·特雷曼、陆瑶、齐亚强.2012,《人口数据收集的新方法》,梁在,主编.《人口学》,北京:中国人民大学出版社.
10Abdul-Quader, Abu S. & Douglas D. Heckathom 2006, " Effectiveness of Respondent-Driven Sampling for Recruiting Drug Users in New York City : Findings from a Pilot Study. " Journal of Urban Health : Bulletin of the New York Academy of Medicine 83 ( 3 ).