摘要
传统的调查方式,特别是入户调查,难度大、周期长、成本高,随着样本需求的增大,传统的调查方式逐渐无法满足研究需要。基于大数据背景下的网络访问固定样本调查能够快速、高效地采集大量、多样性的样本,但却面临着非概率样本在统计推断上缺乏理论支持的挑战。文章利用基于倾向得分的样本匹配方法从网络访问固定样本中抽取与线下概率样本相匹配的样本,与线下概率样本结合产生一个新样本,并重点检验网络访问固定样本与线下样本相混合在统计调查中的可行性。统计检验表明,基于倾向得分的匹配样本近似线下概率样本,匹配样本的调查结果近似线下概率样本的调查结果,在一定条件下网络访问固定样本可以与线下概率样本相混合,近似地替代完全的线下概率样本调查。
The traditional survey method,especially household survey,is difficult with long cycle and high cost.With the increase of sample demand,the traditional survey method gradually fails to meet the research needs.The fixed sample survey of network access under the background of big data can quickly and efficiently collect a large number of diverse samples,but it is faced with the challenge of non-probabilistic sample lacking theoretical support in statistical inference.This paper uses the sample matching method based on propensity score to extract the samples matched with the offline probabilistic sample from the fixed sample of network access,and combine the offline probabilistic samples to produce a new sample.Then the paper focuses on validating the feasibility of mixing the fixed sample of network access with the offline sample in the statistical survey.Statistical tests show that the matched sample based on the propensity score approximates the offline probabilistic samples,that the survey results of matched samples approximate the survey results of the offline probabilistic samples,and that under certain conditions,the fixed sample of network access can be mixed with the offline probabilistic sample to approximately replace the complete offline probabilistic sample survey.
作者
秦文力
Qin Wenli(School of Statistics,Renmin University of China,Beijing 100080,China)
出处
《统计与决策》
CSSCI
北大核心
2020年第9期16-21,共6页
Statistics & Decision
关键词
网络访问固定样本
倾向得分匹配
假设检验
fixed sample of network access
propensity score matching
hypothesis test