摘要
基于信任转移模型和广度优先遍历的算法思想,提出了一种用于发现新浪微博优质用户的方法。选择10个领袖用户作为爬虫的种子用户,基于用户好友关系采用广度优先遍历算法进行用户信息采集。对采集到的结果从用户好友、粉丝情况及用户重复率角度进行了分析,发现随采集深度的增加,用户质量会有所下降,采集深度到达一定值时,优质用户规模不会有太大变化。通过与"热门微博TOP10"用户对比表明,该方法能够发现相对优质的微博用户。
Based on the idea of Trust Transfer model and Breadth First Search(BFS) algorithm,this paper presents a method for discovering high quality users of Sina microblog. Ten leaders are selected as the seed user for the crawler,then BFS algorithm is used to collect user information from the perspective of the relationship between users and friends. The collected data are analyzed from the aspects of the friends,fans and user repetition rate. It is found that with the increase of sampling depth,the quality of the user will decline. When the collecting depth reaches a certain value,the high-quality user scale will not change too much. Compared with the popular microblog TOP10 users,the experiment shows that the proposed method can obtain relatively high quality micro-blog users.
出处
《北京信息科技大学学报(自然科学版)》
2017年第4期69-74,共6页
Journal of Beijing Information Science and Technology University
基金
863计划课题"面向基础教育的知识能力智能测评与类人答题验证系统"(2015AA015409)
关键词
新浪微博
优质用户
信任转移模型
广度优先遍历
Sina microblog
high quality user
trust transfer model
breadth first search