摘要
微博的热点事件会产生大量评论数据,这些数据是进行舆情分析和网络水军识别等数据挖掘的基础。论文分析对比常用的网络爬虫技术和框架,分别使用Selenium框架和Json数据接口两种方法,采集新浪微博热点事件下的用户评论数据。一般网络爬虫技术多使用广度搜索,这里采用深度搜索,能够更精确地获得某个热点事件下的用户评论数据。
Hot events on MicroBlog will generate a large amount of comment data,which is the basis for data mining such as public opinion analysis and online water army identification.The paper analyzes and compares commonly used web crawler technologies and frameworks,using Selenium framework and Json data interface respectively to collect user comment data under hot events on Sina MicroBlog.Generally,web crawling technology uses breadth search,and deep search is adopted here to obtain user comment data under a hot event more accurately.
作者
黄红桃
江盈锋
HUANG Hongtao;JIANG Yingfeng(School of Information,Guangdong University of Foreign Studies,Guangzhou,Guangdong Province,510006 China)
出处
《科技创新导报》
2021年第14期132-135,139,共5页
Science and Technology Innovation Herald
基金
广州市科技计划项目(项目编号:No.202002030239)。