期刊文献+

微博信息采集及群体行为分析 被引量:5

Micro-blog Information Collection and Group Behaviors Analysis
下载PDF
导出
摘要 随着在线社会关系网络的迅猛发展,每天数以千万计的人通过发表、评论、分享等方式,产生和传播各类话题.对在线社会关系数据的感知与收集、存储管理、群体行为等进行研究,能更好地挖掘和分析社会关系网络.由于微博平台的登录、数据显示与处理等方面与传统网络平台有很大差异,传统网络爬虫不适于对微博信息的全面抓取.本文采用模拟用户浏览行为方法来爬取海量微博数据,通过数据包截取与分析等手段获取相关信息.实验结果表明该方法的有效性.在此基础上,以收集的微博数据为研究对象,对群体行为进行了分析. With the rapid development on online social relationship network, millions of people present or comment or share their top- ics on this newly platform everyday. It is necessary to mine and analyze this domain by means of the perception, collection, storage and analysis of the social relationship big data. It is hard for a traditional web page crawler to crawl micro-blog data as usual, as there are more differences on micro-blog's login, display or data processing. We present an algorithm on modelling on micro-blog data crawler based on simulating browsers' behaviors. This needs to analyze the simulated browsers' behaviors in order to obtain the rele- vant information. The experimental results and the analysis show the feasible of the approach. On the basis of the crawled micro-blog data, we present the group behaviors analysis.
出处 《小型微型计算机系统》 CSCD 北大核心 2013年第10期2413-2416,共4页 Journal of Chinese Computer Systems
基金 河北省自然科学基金项目(F2013208105)资助 河北省科技支撑计划项目(12213516D)资助
关键词 微博 信息采集 微博爬虫 群体行为分析 micro-blog information collection micro-blog crawler group behaviors analysis
  • 相关文献

参考文献3

二级参考文献126

  • 1杨楠,弓丹志,李忺,孟小峰.Web社区发现技术综述[J].计算机研究与发展,2005,42(3):439-447. 被引量:35
  • 2[OL].<http://hadoop.apache.org.>.
  • 3WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
  • 4TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
  • 5Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
  • 6Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
  • 7Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
  • 8DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
  • 9Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.
  • 10Brewer E A. Towards robust distributed systems//Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC' 00). Portland, Oregon, USA, 2000:7.

共引文献946

同被引文献44

  • 1李超锋,卢炎生.基于URL结构和访问时间的Web页面访问相似性度量[J].计算机科学,2007,34(4):207-209. 被引量:4
  • 2刘金红,陆余良.主题网络爬虫研究综述[J].计算机应用研究,2007,24(10):26-29. 被引量:131
  • 3MILSTEIN S, CHOWDHURY A, HOCHMUTH G, et al. Twit- ter and the micro - messaging revolution: Communication, connections, and immediacy -140 characters at a time [ M ]. KOL USA : O'Reilly Report ,2008,19 - 25.
  • 4HUBERMAN B A, ROMERO D M, WU F. Social networks that matter:twitter under the microscope [ J]. Peer Reviewed Journal on the Internet,2009,14( 1 ) :1 -5.
  • 5WANG Xinyue, QUEEN M, TOKAR C, et al. Exploiting hashtags for adaptive microblog crawling [ C]. 2013 IEEE/ ACM International Conference on Advances in Social Net- works Analysis and Mining,2013:311 -315.
  • 6SYMEON P, YIANNIS K. Social multimedia crawling for mining and search [ J ]. Computer, 2014,47 ( 5 ) : 84 - 87.
  • 7MediaSifton. Media sifton twitter [ EB/OL ]. (2013 - 12 - 10 ) [ 2014 - 06 - 11 ] https ://twitter. corn/mediasift.
  • 8Datasift. Datasift [ EB/OL]. (2014 - 01 - 10) [ 2014 - 06 - 11 ] http ://datasift. com.
  • 9Gnip. Gnip - providing social media data for the enterprise [EB/OL]. (2013 - 03 - 15) [2014 - 06 - 11 ] http:// gnip. com.
  • 10ToddWasserman. Twitter's data firehose is its secret weapon against facebook [ EB/OL ]. ( 2013 - 07 - 11 ) [ 2014 - 06 - 11 ] http ://mashable. com.

引证文献5

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部