期刊文献+

基于Hadoop云平台的中文微博情感分类研究

Study on sentiment classification for Chinese microblog based on Hadoop
下载PDF
导出
摘要 随着用户发表微博数量的急剧增长,数据集已经达到TB级甚至PB级。针对在单机环境下无法很好地完成海量微博数据集的情感分类任务,文中提出一种基于Hadoop云平台的中文微博情感分类方案。结合微博文本特有的语言特征,依次在MapReduce框架上实现了预处理、特征选择、文本向量化表示、KNN分类算法的并行化。通过对比单机和集群的实验结果表明:Hadoop云平台下的情感分类效率能随着集群规模的扩增而快速提升,并且不影响其分类效果。 With the rapid increase of the number of microblogs the users post, dataset has already reached to a TB or even a PB level. Aiming at tasks of sentiment classification that numerous microblog dataset fails to be completed in a stand-alone environment, this paper provides a project of microblog sentiment classification based on a Hadoop cloud platform. With reference to the characteristics of the mieroblog language , in the MapReduce frame the preprocessing, feature selection, text vectorization, and parallelization of KNN classification algorithms are accordingly realized. The conclusion could be drawn by comparing the stand-alone environment and the cluster experiment database: the efficiency of microblog sentiment classification based on the Hadoop cloud platform increases as the expansion of the cluster scale, and the classification effects would not be affected at the same time.
作者 邵丘 杨鹤标
出处 《信息技术》 2015年第9期215-218,共4页 Information Technology
关键词 情感分类 HADOOP 海量数据 KNN分类算法 并行化 sentiment classification Hadoop massive data KNN classification algorithm parallelization
  • 相关文献

参考文献4

二级参考文献50

  • 1黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 2KIM SM,HOVY E.Identifying and analyzing judgmentopinions[C].PA,USA:Proceedings of the Main Conferenceon Human Language Technology Conference of the North A-merican Chapter of the Association of Computational Linguis-tics,2006:200-207.
  • 3Devitt A,Ahmad K.Sentiment polarity identification in finan-cial news:A cohesion based approach[C].Prague,CZ:As-sociation for Computational Linguistics,2007:984-991.
  • 4PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends in Information Retrieval,2008,2(1-2):1-135.
  • 5Titov I,McDonald R.Modeling online reviews with multi-grain topic models[C].New York,NY,USA:Proceedingsof the 17th International Conference on World Wide Web,2008:1-120.
  • 6Stoyanov V,Cardie C.Topic identification for fine-grained o-pinion analysis[C].PA,USA:Proceedings of the 22nd In-ternational Conference on Computational Linguistics,2008:817-824.
  • 7CHOI Y,CARDIE C,RILOF E.Identifying sources ofopinions with conditional random fields and extraction patterns[C].PA,USA:Proceedings of the Conference on HumanLanguage Technology and Empirical Methods in Natural Lan-guage Processing,2009:355-362.
  • 8ZHAO J,LIU K,WANG G.Adding redundant features forCRFs-based sentence sentiment classification[C].PA,USA:Proceedings of the Conference on Empirical Methods in NaturalLanguage Processing,2008:117-126.
  • 9ZHAO J,XU H B,HUANG X J.Overview of Chineseopinion analysis evaluation[EB/OL].http://nlpr-web.ia.a.c/2008papers/gnhy/nh1 0.pdf,2008.
  • 10Kristof Coussenment,Dirk Vanden.Improving customercomplaint management by automatic email classification usinglinguistic style features as predictors[EB/OL].http://www.elsevier.com/locate/dss,2007.

共引文献292

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部