摘要
Naive Bayes是一种基于概率的分类器,它用各个类别的先验概率和每个类别出现特定特征的条件概率来预测出现这些特征的个体的类别。针对当前"网络负面信息满天飞"的现状,本文提出了一种基于朴素贝叶斯模型的网络负面信息预警策略。与一般的文本分类不同,针对大规模网络碎片化信息的情感识别一方面对执行效率要求很高,另一方面主要关注有主观情感倾向的词。针对这些问题,我们做了相应的优化策略,如提取情感倾向专用停用词表,细化对否定词的处理等,并以2万条微博数据样本为例进行测试,实验证明这些策略在文本情感识别中具有较为理想的执行效率和准确率。
Naive Bayes is a kind of classifi er based on probability and used for the prediction of individual categories with the prior probability and conditional probability of each category. Targeting at the current "negative information all over the Internet" phenomenon, the paper offers an early warning model based on Naive Bayes method. Different from general text classifi cation, emotion recognition aimed at large-scale network information mainly focuses on words with subjective emotiosn and requires high execution effi ciency. To solve these problems, we conducted the corresponding optimization, such as extracting Emotional Tendency Stop Words List, detailing the management of negative words, and taking 20000 Twitters as sample to test the effectiveness of the model on text emotion recognition. Experiments showed these strategies have ideal execution effi ciency and accuracy.
出处
《图书馆杂志》
CSSCI
北大核心
2014年第8期78-82,共5页
Library Journal
基金
中国人民公安大学研究生创新项目"基于模式匹配和机器学习的网络舆情情感倾向性分析模型研究"(项目编号:2013SKX04-5)的研究成果之一
关键词
负面信息
情感分析
机器学习
朴素贝叶斯
舆情监测
预警
Negative information
Sentiment analysis
Machine learning
Naive Bayes
Public opinion monitoring
Early warning