期刊文献+

微博用户信息采集分析系统设计与实现 被引量:2

Design and Implementation of Microblog User Information Acquisition and Analysis System
下载PDF
导出
摘要 系统运用Python语言克服新浪微博反爬虫问题,使用Scrapy框架实现了高效、稳定的微博用户信息爬虫程序,全面获取用户在微博中的基本信息,并导入Neo4j图数据库和Echarts数据可视化库进行人物关系分析和挖掘。此外,系统针对微博中存在大量“网络水军”的现状设置了过滤选项,可以有效排除“网络水军”非正常行为对分析结果的影响。系统调试结果表明,系统能够实现对特定微博下转发、评论用户信息的实时、稳定、高效采集与分析,有效帮助人们从海量数据中提取复杂的关联关系,简洁、直观地分析微博用户之间的交互关系。 An efficient and stable crawler system based on Scrapy for microblog user information acquisition and analysis is designed. In the system, by overcoming anti-crawler problem of Sina Weibo, it can obtain all basic profile information of microblog users. The obtained user information can be imported into Neo4j graph database and Echarts visual diagram to analyze and mine the relationship between users. Additionally, according to the current situation of a large number of Internet paid posters existed in Microblog, the system provides a filtering option, which can effectively eliminate the influence of abnormal behavior of paid posters on the analysis results. The debugging results show that the system can crawl and analyze user information for specific microblog forwarding and commenting to achieve the real-time, stable and effective performance. It can effectively help people extract complex relationships from massive data and analyze the interaction between Microblog users concisely and intuitively.
作者 张扬 范岩 夏玲玲 陈俊安 王沁 ZHANG Yang;FAN Yan;XIA Ling-ling;CHEN Jun-an;WANG Qin(Department of Computer Information and Cyber Security,Jiangsu Police Institute,Nanjing 210031,China)
出处 《软件导刊》 2019年第9期125-129,共5页 Software Guide
基金 江苏省高等学校大学生创新创业训练计划项目(201810329027Y)
关键词 新浪微博 网络爬虫 模拟登录 数据分析 Sina Weibo network crawler simulation login data analysis
  • 相关文献

参考文献15

二级参考文献62

共引文献297

同被引文献21

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部