摘要
Web文本情感极性分类算法在网络舆情监控方面具有重要的研究价值。针对传统文本分类算法依赖于情感词典的弊端,以及不能很好的应用于不规则的Web文本分类的局限性,提出基于协同过滤和文本相似度的Web文本情感极性分类算法。先统计分析网络文本高频词汇覆盖情况,进而根据统计结果,基于协同过滤和余弦相似度计算提出一种新的Web文本情感极性分类算法,其利用余弦相似度方法计算出Web文本的相似性,判断文本的情感极性。对于无法直接判断情感极性的文本,该算法设计了协同过滤中的情感词评分以及Top-N情感词推荐机制,且通过对情感词的评分与推荐输出进行多次迭代相似度计算来判断未知Web文本情感极性。最后使用中文情感挖掘语料(Chn Senti Corp)进行实验。结果表明,算法具有较高的查全率和查准率,在不规则的Web文本下也表现出较好的分类效果,可较实用地解决Web文本情感极性分类问题并应用于网络舆情监控。
The sentiment classification algorithm has important research value for applications of web texts based network monitoring public opinion.To overcome the limitations that traditional sentiment classification algorithms depend heavily on their built sentiment word bases and they are not suitable for nonstandard web texts, we proposed a novel sentiment classification algorithm for nonstandard web texts based on the collaborative filtering and text similarity theories.This paper starts with a comprehensive evaluation of the coverage of high-frequency words in web texts.And based on the evaluation results, we proposed a novel collaborative sentiment classification algorithm based on the innovation theories of the collaborative filtering and text similarity computing.It calculates the similarity among huge large amounts of web texts by using the cosine similarity equation, and then automatically judge sentiments for corresponding web texts.For texts unable to judge sentiments directly, this algorithm application of sentiment word score and Top-N sentiment word recommendation on collaborative filtering, and judge sentiment of web texts by similarity computing using iterative way.Finally, the devised algorithm has been tested and evaluated by using the ChnSentiCorp data from internet.Experiments show that this algorithm has high recall and precision, and also better result for nonstandard web texts.It can solve nonstandard web text classification problem better and practically applied to applications of network monitoring public opinion.
出处
《成都信息工程学院学报》
2015年第4期355-360,共6页
Journal of Chengdu University of Information Technology
基金
国家自然科学基金资助项目(61203172
61202250)
四川省应用基础计划资助项目(2012JY0111)
关键词
计算机应用技术
智能信息处理
文本情感分类
舆情监控
协同过滤
余弦相似度
WEB文本
technology of computer application
intelligent information processing
sentiment classification
public sentiment monitoring
collaborative filtering
cosine similarity
web texts