期刊文献+

基于协同过滤和文本相似性的Web文本情感极性分类算法 被引量:3

The Novel Sentiment Classification Algorithm for Web Texts based on Collaborative Filtering and Text Similarity
下载PDF
导出
摘要 Web文本情感极性分类算法在网络舆情监控方面具有重要的研究价值。针对传统文本分类算法依赖于情感词典的弊端,以及不能很好的应用于不规则的Web文本分类的局限性,提出基于协同过滤和文本相似度的Web文本情感极性分类算法。先统计分析网络文本高频词汇覆盖情况,进而根据统计结果,基于协同过滤和余弦相似度计算提出一种新的Web文本情感极性分类算法,其利用余弦相似度方法计算出Web文本的相似性,判断文本的情感极性。对于无法直接判断情感极性的文本,该算法设计了协同过滤中的情感词评分以及Top-N情感词推荐机制,且通过对情感词的评分与推荐输出进行多次迭代相似度计算来判断未知Web文本情感极性。最后使用中文情感挖掘语料(Chn Senti Corp)进行实验。结果表明,算法具有较高的查全率和查准率,在不规则的Web文本下也表现出较好的分类效果,可较实用地解决Web文本情感极性分类问题并应用于网络舆情监控。 The sentiment classification algorithm has important research value for applications of web texts based network monitoring public opinion.To overcome the limitations that traditional sentiment classification algorithms depend heavily on their built sentiment word bases and they are not suitable for nonstandard web texts, we proposed a novel sentiment classification algorithm for nonstandard web texts based on the collaborative filtering and text similarity theories.This paper starts with a comprehensive evaluation of the coverage of high-frequency words in web texts.And based on the evaluation results, we proposed a novel collaborative sentiment classification algorithm based on the innovation theories of the collaborative filtering and text similarity computing.It calculates the similarity among huge large amounts of web texts by using the cosine similarity equation, and then automatically judge sentiments for corresponding web texts.For texts unable to judge sentiments directly, this algorithm application of sentiment word score and Top-N sentiment word recommendation on collaborative filtering, and judge sentiment of web texts by similarity computing using iterative way.Finally, the devised algorithm has been tested and evaluated by using the ChnSentiCorp data from internet.Experiments show that this algorithm has high recall and precision, and also better result for nonstandard web texts.It can solve nonstandard web text classification problem better and practically applied to applications of network monitoring public opinion.
出处 《成都信息工程学院学报》 2015年第4期355-360,共6页 Journal of Chengdu University of Information Technology
基金 国家自然科学基金资助项目(61203172 61202250) 四川省应用基础计划资助项目(2012JY0111)
关键词 计算机应用技术 智能信息处理 文本情感分类 舆情监控 协同过滤 余弦相似度 WEB文本 technology of computer application intelligent information processing sentiment classification public sentiment monitoring collaborative filtering cosine similarity web texts
  • 相关文献

参考文献13

  • 1王平,谢耘耕.突发公共事件网络舆情的形成及演变机制研究[J].现代传播(中国传媒大学学报),2013,35(3):63-69. 被引量:118
  • 2李光敏,许新山,熊旭辉.Web文本情感分析研究综述[J].现代情报,2014,34(5):173-176. 被引量:6
  • 3王洪伟,刘勰,尹裴,廖雅国.Web文本情感分类研究综述[J].情报学报,2010,29(5):931-938. 被引量:31
  • 4Tong, R M. An operational system for detecting and tracking opinions in on-line discussions [ C]. Working Notes of the ACM SIGIR 2001 Work- shop on Operational Text Classification. New York, NY: ACM,2001 : 1 - 6.
  • 5Minqing Hu, Bing Liu. Mining and Summarizing Customer Reviews [ C ]. Association for Comput- ing Machinery(ACM) Special Interest Group on Knowledge Discovery and Data Mining(SIGKDD) International Conference on Knowledge Discoveryand Data Mining; 20040822- 20040825; Seat- tle,WA; US,2004.
  • 6李钝,曹付元,曹元大,万月亮.基于短语模式的文本情感分类研究[J].计算机科学,2008,35(4):132-134. 被引量:35
  • 7Pang B, Lee L, Vaithyanathan S. Thumbs up ati- ment Classification using Machine I.eaming Tech- niques [J ]. Proceedings of Emnlp, 2002: 79 - 86.
  • 8孟迪,李立宇,于津.基于情感项区分极性可信度的文本情感分类[J].汕头大学学报(自然科学版),2014,29(3):66-73. 被引量:1
  • 9郑泽之,王强军,张普,等.基于大规模DCC语料库的《现代汉语常用字表》、《现代汉语通用字表》收字情况统计分析[J].Advances,2003.
  • 10数据堂.中文情感挖掘语料-ChnSentiCorp[EB/OL].http://www.datatang.com/data/14614.

二级参考文献51

共引文献453

同被引文献60

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部