摘要
通过对文本情感分类的研究,考虑微博文本信息的篇幅短小、情感符号丰富及大量网络词汇的特点,提出一种适用于中文微博情感分类的基于Map/Reduce的分布式朴素贝叶斯算法。算法通过构建适用于微博文本的情感词典来完成情感特征属性的提取,以期达到较为理想的分类效果。实验结果表明,这种方法能够很好地适用于微博情感分类,达到较理想的分类效果,满足针对海量的微博文本数据处理的可行性与高效性的需求。
Based on the research of text sentiment classification, and considering the characteristics of microblogging text information in short message length, abundant emotional icons and a great deal of network vocabularies, we propose a distributed NaYve Bayes algorithm which is based on Map/Reduce programming model and suitable for sentiment classification of Chinese microblogging. The algorithm implements the extraction of emotional features attributes by constructing a microblogging text-fitted dictionary of emotional information in order to achieve fairly ideal classification effect. Results of experiment show that the method can well adapt for sentiment classification of microblogging and realises ideal classification effect for meeting the demand of feasibility and efficiency in light of massive microblogging text data l^rocessing.
出处
《计算机应用与软件》
CSCD
2015年第7期60-62,142,共4页
Computer Applications and Software
基金
上海市自然科学基金项目(09ZR1409500)