摘要
利用大数据进行抽样,很多情况下由于抽样框的构造比较困难,使得抽取的样本属于非概率样本,传统的抽样推断理论难以应用到非概率样本中,如何解决非概率抽样的统计推断问题,是大数据背景下抽样调查面临的严重挑战。本文提出了解决非概率抽样统计推断问题的基本思路:一是抽样方法,可以考虑基于样本匹配的样本选择、链接跟踪抽样方法等,使得到的非概率样本近似于概率样本,从而可采用概率样本的统计推断理论;二是权数的构造与调整,可以考虑基于伪设计、模型和倾向得分等方法得到类似于概率样本的基础权数;三是估计,可以考虑基于伪设计、模型和贝叶斯的混合概率估计。最后,本文以基于样本匹配的样本选择为例探讨了具体解决方法。
When sampling is done with big data, the construction of sampling frame is difficult in many cases, so that the sample belongs to non-probability sample, and it is difficult to apply the traditional inference theory of sampling to the non-probability sample. Therefore, under the background of big data it is a serious challenge to sampling survey to solve the statistical inference problems of non-probability sampling. The research proposes some basic ideas to solve the statistical inference problems of non-probability sampling. First, sampling methods such as the sample selection method based on sample matching and the method of link-tracing sampling can be considered, so that the obtained non-probability sample approximates to probability sample and then the statistical inference theory of probability sample can be used. Second, the construction and adjustment methods of weights based on pseudo design, models and propensity score can be considered to obtain the base weights similar to probability sample. Third, the estimation methods based on pseudo design, models and Bayesian hybrid probability can be considered. Finally, the sample selection method based on sample matching is taken as an example to discuss concrete solutions to the statistical inference problems of non-probability sampling.
出处
《统计研究》
CSSCI
北大核心
2016年第3期11-17,共7页
Statistical Research
基金
中国人民大学2015年度拔尖创新人才培育资助计划成果
关键词
大数据
非概率抽样
统计推断
Big Data
Non-probability Sampling
Statistical Inference