摘要
针对微博网络舆情信息量大、无规则、随机变化的特点,提出TFIDF-NB(Term Frequency Inverse Document Frequency-Naive Bayes)用于微博情感分析,设计与实现了一个基于Scrapy框架的微博评论爬虫,将某热点事件的若干条微博评论进行爬取并存进数据库,然后进行文本分割、LDA(Latent Dirichlet Allocation)主题聚类,最后使用TFIDF-NB算法进行情感分类。实验结果表明,TFIDF-NB算法平均准确率高于线性支持向量机算法和K近邻算法,在精确率和召回率方面高于K近邻算法,具有较好的情感分类效果。
In view of the large amount of public opinion information on Weibo,irregular and random changes,this paper proposes a Weibo sentiment analysis method based on TFIDF-NB(Term Frequency Inverse Document Frequency-Naive Bayes)algorithm.By coding a Weibo comment crawler based on the Scrapy framework,several Weibo comments on a hot event are crawled and stored in the database.Then text segmentation and LDA(Latent Dirichlet Allocation)topic clustering are performed.And finally the TFIDF-NB algorithm is used for sentiment classification.Experimental results show that the accuracy of the algorithm is higher than that of the standard linear Support Vector Machine algorithm and the K-Nearest Neighbor algorithm,and it is higher than the K-Nearest Neighbor algorithm in terms of accuracy and recall,and it has a better effect on sentiment classification.
作者
杨戈
杨麓涛
Yang Ge;Yang Lutao(Key Laboratory of Intelligent Multimedia Technology,Beijing Normal University(Zhuhai Campus),Zhuhai 519087,China;Engineering Lab on Intelligent Perception for Internet of Things(ELIP),Shenzhen Graduate School,Peking University,Shenzhen 518055,China)
出处
《电子技术应用》
2021年第4期59-62,66,共5页
Application of Electronic Technique
基金
广东高校省级重大科研项目(2018KTSCX288,2019KZDXM015,2020ZDZX3058)
广东省学科建设专项(2013WYXM0122)
智能多媒体技术重点实验室(201762005)
北京师范大学珠海分校2019校级“质量工程”课程思政项目(201932)。
关键词
微博舆情
网络爬虫
情感分类
Weibo public opinion
web crawler
sentiment classification