期刊文献+

大数据背景下的抽样调查 被引量:3

Sampling Survey in the Context of Big Data
原文传递
导出
摘要 大数据具有体量大、种类丰富、增长速度快等特点,同时也存在价值密度低、代表性差等问题,为抽样调查带来了机遇与挑战.大数据背景下的抽样如何适应新的变化、具有怎样的发展和应用?文章从三个角度进行了讨论.一是在数据流环境下产生了一些适应性强的新型抽样方法,能够高效、准确地获得有代表性样本,并兼顾存储空间、处理的时间与能力.二是借助网络开展调查或进行社交网络数据的收集,发展出一些无抽样框的非概率抽样方法,能够以低廉的成本在短时间内获得大量分析样本.三是综合大数据与抽样调查的优势,进行线上、线下调查数据的融合,文章针对线上样本是非概率样本、线下样本是概率样本的情况,提出了融合的基本思路:一方面,通过概率样本对非概率样本进行``概率性检验'',另一方面,通过提取概率样本的信息,基于模型或基于伪随机化对总体进行推断. Big data is characterized by large volume,rich types,and rapid growth,but it also has problems such as low value density and poor representativeness,which brings opportunities and challenges to sampling survey.In the context of big data,how does sampling survey adapt to new changes and what kind of development and application does it have?This paper discusses it from three perspectives.First,there are some new sampling methods with strong adaptability in the data stream environment,which can obtain representative samples efficiently and accurately,and take into account the storage space,processing time and ability.Secondly,some non-probability sampling methods without sampling frame have been developed by means of internet survey or social network data collection,which can obtain a large number of analysis samples in a short time at low cost.Third,the advantages of big data and sampling survey are integrated to integrate online and offline survey data.In the case that online sample is non-probability sample and offline sample is probability sample,this article puts forward the basic idea of data integration:On the one hand,probability samples are used to carry out the``probability test''for non-probability samples;on the other hand,the information of probability samples is extracted and make inferences based on model or pseudo-randomization.
作者 金勇进 刘晓宇 JIN Yongjin;LIU Xiaoyu(Center for Applied Statistics,Renmin University of China,Beijing 100872;School of Statistics,Renmin University of China,Beijing 100872;Institute of Survey Technology,Renmin University of China,Beijing 100872)
出处 《系统科学与数学》 CSCD 北大核心 2022年第1期2-16,共15页 Journal of Systems Science and Mathematical Sciences
关键词 大数据 抽样调查 数据流 非概率抽样 数据融合 Big data sampling survey data stream non-probability sampling data integration
  • 相关文献

参考文献13

二级参考文献104

共引文献261

同被引文献40

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部