摘要
随着在线社会关系网络的迅猛发展,每天数以千万计的人通过发表、评论、分享等方式,产生和传播各类话题.对在线社会关系数据的感知与收集、存储管理、群体行为等进行研究,能更好地挖掘和分析社会关系网络.由于微博平台的登录、数据显示与处理等方面与传统网络平台有很大差异,传统网络爬虫不适于对微博信息的全面抓取.本文采用模拟用户浏览行为方法来爬取海量微博数据,通过数据包截取与分析等手段获取相关信息.实验结果表明该方法的有效性.在此基础上,以收集的微博数据为研究对象,对群体行为进行了分析.
With the rapid development on online social relationship network, millions of people present or comment or share their top- ics on this newly platform everyday. It is necessary to mine and analyze this domain by means of the perception, collection, storage and analysis of the social relationship big data. It is hard for a traditional web page crawler to crawl micro-blog data as usual, as there are more differences on micro-blog's login, display or data processing. We present an algorithm on modelling on micro-blog data crawler based on simulating browsers' behaviors. This needs to analyze the simulated browsers' behaviors in order to obtain the rele- vant information. The experimental results and the analysis show the feasible of the approach. On the basis of the crawled micro-blog data, we present the group behaviors analysis.
出处
《小型微型计算机系统》
CSCD
北大核心
2013年第10期2413-2416,共4页
Journal of Chinese Computer Systems
基金
河北省自然科学基金项目(F2013208105)资助
河北省科技支撑计划项目(12213516D)资助
关键词
微博
信息采集
微博爬虫
群体行为分析
micro-blog
information collection
micro-blog crawler
group behaviors analysis