期刊文献+

面向Twitter情感分析的文本预处理方法研究 被引量:5

Research on Text Preprocessing Method for Twitter Sentiment Analysis
下载PDF
导出
摘要 社交网络已被广泛地用于通过基于互联网的文本消息和图像在公共领域表达意见。Twitter的情感分析为组织提供了实时监控与他们相关的产品和事件的公众感觉的能力,成为公众情绪监测的有效途径。情感分析的第一步是数据的文本预处理。现有的关于Twitter情感分析的研究主要集中在新情感特征的提取上,而忽略对预处理方法的深入研究。在本文中,我们研究了基于支持向量机(SVM)、朴素贝叶斯、最大熵和基于人工神经网络的监督分类器在Twitter数据上的特征提取及分类方法。我们还提出了基于Mapreduce的主成分分析(MPCA)与SVM结合的分类算法模型。然后讨论了文本预处理方法对两类分类任务中情感分类性能的影响,总结了各种预处理方法在Twitter数据集上的特征模型和四种分类方法的分类性能。实验结果表明在经过了参数调优后,我们提出的分类算法模型不仅提高了Twitter情感分类的准确率和F1指标,而且能解决支持向量机和人工神经网络的计算消耗问题,算法模型具有一定的扩展性,实验结果令人满意。 Social networks have been widely used to express opinions in the public domain through text messages and images of Internet. Twitter’s sentiment analysis provides the organization with the ability to monitor the public sense of products and events in real time,it is an effective way to monitor public sentiment. The first step of sentiment analysis is text preprocessing of the data. The existing researches on Twitter sentiment analysis mainly focus on the extraction of new emotional features,while ignoring the in-depth study of preprocessing methods. In this paper,we study the method of feature extraction and classification on Twitter data based on support vector machine(SVM),naive Bayes(NB),maximum entropy(ME)and artificial neural network(ANN). We also proposed a classification algorithm model based on Mapreduce and principal component analysis(MPCA)combining SVM. The effect of text preprocessing on the performance of the emotional classification in two class classification tasks is discussed,and the classification performance of the feature model and the four classification methods on the Twitter data set are summarized. The experimental results show that after the optimization of parameters,the proposed classification algorithm not only improves the accuracy of Twitter and F1,but also solves the problem of computing consumption of support vector machines and artificial neural networks. The algorithm model has a certain expansibility and the experimental results are satisfactory.
作者 王永昌 朱立谷 WANG Yong-chang;ZHU Li-gu(School of Computer Science,Communication University of China,Beijing 100024,China;Shijiazhuang University,Shijiazhuang 050000,China)
出处 《中国传媒大学学报(自然科学版)》 2019年第2期31-38,共8页 Journal of Communication University of China:Science and Technology
关键词 TWITTER 情感分析 文本预处理 MAPREDUCE Twitter sentiment analysis text preprocessing Mapreduce
  • 相关文献

参考文献2

二级参考文献2

共引文献11

同被引文献69

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部