摘要
在线社交网络采样方法常作为其他采样方法的评估基准,但是该方法采样命中率和采样效率较低,影响了其应用。为此,提出一种自适应UNI采样方法。该方法将用户ID系统空间划分为若干区间进行采样,根据各区间命中率自适应地调节在各区间的采样概率,以提高采样命中率和效率。设定采样概率下限阈值解决冷启动问题,同时利用区间的采样率调节区间采样概率,防止陷入局部最优。将该方法应用于新浪微博的采样数据进行验证,实验结果表明,该方法可提高采样效率和采样命中率。
Online Social Network( OSN) sampling method is usually used as the benchmark to evaluate other sampling methods. However,the poor performance of UNI limits its application. In this paper,a sampling method called adaptive UNI is proposed. In this method,the whole space of user ID system is divided into intervals. The probability of sampling is adaptively adjusted in each interval according to the real hit rate of the interval. In this process,a threshold is set as the lower limit to solve the cold start problem,while the sampling rate of the interval is used to avoid local optimum. The validity of the method is proved by applying it to real sampling from Weibo. Experimental results showthat the method can improve the sampling efficiency and hit rate.
出处
《计算机工程》
CAS
CSCD
北大核心
2017年第4期200-206,共7页
Computer Engineering
基金
北京高等学校青年英才计划项目(YETP0506)
关键词
在线社交网络
采样方法
UNI方法
自适应方法
区间划分
Online Social Network(OSN)
sampling method
UNI method
adaptive method
interval partition