摘要
在研究微博用户行为时,研究者常需要利用抽样数据来估计微博用户的总体比例。然而互联网数据具有的海量性和不稳定性导致在微博环境下使用概率抽样方法出现困难。文章分析了一种非概率抽样方法--同伴驱动抽样方法,并引入了一对多轮换估计的概念,提出了一对多轮换估计法下的同伴驱动抽样,来估测微博用户的总体比例。经过理论推导和实证检验,一对多轮换估计法下的同伴驱动抽样方法能够有效地估计多类微博用户的总体比例,是一种可推广于社交网络数据采集的大数据抽样方法。
In studying the behavior of micro-blog users, the researchers often use sampling data to estimate the overall proportion of micro-bloggers. However, the massive and unstable nature of the Internet data leads to the difficulty of using probability sampling method in micro-blog environment. This paper analyzes a non-probability sampling method, i.e. respondent-driven sampling method, and introduces the concept of one-to-multiple rotation estimation, proposing using respondent-driven sampling under a multi-rotation estimation method to evaluate the overall proportion of micro-blog users. Through theoretical derivation and empirical test, the respondent-driven sampling method under the one-to-many rotation estimation method can be used to effectively estimate the overall proportion of the multiple types of micro-blog users. This is a big data sampling method that can be popularized in social network data collection.
作者
聂瑞华
石洪波
米子川
Nie Ruihua;Shi Hongbo;Mi Zichuan(Department of Economics,Taiyuan Normal University,Taiyuan 030619,China;School of Information and Management,Shanxi University of Finance and Economics,Taiyuan 030006,China;School of Statistics,Shanxi University of Finance and Economics,Taiyuan 030006,China)
出处
《统计与决策》
CSSCI
北大核心
2019年第22期16-19,共4页
Statistics & Decision
基金
国家社会科学基金资助项目(17BTJ010)
关键词
同伴驱动抽样
比例估计
一对多轮换估计法
大数据抽样
respondent-driven sampling
proportional estimation
one-to-multiple rotation estimation
big data sampling