摘要
微博是当前国内最流行的社交平台之一,微博文本的情感分析有助于进一步分析实现其媒体价值,然而,微博数据庞大且冗余性高,使得文本特征具有较高的稀疏性和局限性,在小样本数据分析上情感判断结果并不理想。因此,提出一种基于支持向量机分类模型的微博数据情感分析方法,首先通过weibo Spider爬取微博数据,进行人工标注构建微博文本数据集,然后联合优化TF-IDF算法和传统词袋,提出一种基于关键词的词袋模型,获取文本特征矩阵以解决微博文本高稀疏、高冗余的问题,最后构建高斯核的支持向量机分类器实现对微博数据的情感分析。实验结果显示,对比朴素贝叶斯、决策树等方法,提出的方法可获得较高的准确率,且在小样本数据上有明显优势。
Weibo is currently one of the most popular social platforms in China.The sentiment analysis of Weibo text is helpful for its media value.Therefore,a weibo data sentiment analysis method based on support vector machine is proposed.First,the weibo data is crawled through Weibo Spider,and the weibo text data set is constructed by manual annotation.Then,combined with TF-IDF algorithm and traditional bag of words model,a new bag of words model based on keywords is proposed to obtain the text feature matrix to solve the problem of high sparsity and high redundancy of weibo text,and finally the Gaussian kernel support vector machine method is used to perform sentiment analysis on the crawled weibo data.Compared with the methods such as naive Bayes and decision trees,the experimental results show that the method in this paper obtain a higher accuracy rate,and has obvious advantages on small sample data.
作者
李首政
王琪
王力
Li Shouzheng;Wang Qi;Wang Li(School of Information Engineering,Nanyang Institute of Technology,Nanyang 473000;School of Civil Engineering,Nanyang Institute of Technology,Nanyang 473000)
出处
《现代计算机》
2022年第19期63-66,80,共5页
Modern Computer
关键词
微博文本
情感分析
支持向量机
机器学习
Weibo text
sentiment analysis
support vector machine
machine learning