期刊文献+

基于主题提取的海量微博情感分析 被引量:7

Sentimental analysis of massive micro-blog based on topic extraction
下载PDF
导出
摘要 从海量微博数据中分析公众对某一社会事件的情感倾向具有重要研究意义,而海量微博文本稀疏规模庞大,导致传统方法处理这一任务时面临诸多挑战.提出一种基于主题聚类的海量微博情感分析方法.首先基于高质量微博数据挖掘频繁项集,设定语义相关阈值,筛选重要频繁项集进行谱聚类,得到主题关键词.基于主题关键词对海量微博数据依据语义相关度归类,最后结合情感词典对每类中的微博检索主题关键词前后修饰距离内情感词及否定词,结合表情符号计算微博情感值.在百万规模中文微博上进行实验,证明该方法能准确按主题归类且能有效在该主题上进行情感分类. It is of great significance to analyze public sentimental tendency for a social event from massive micro-blog data of social network. Massive micro-blog data features sparse, large scale, and so on, so traditional methods of handling this task face many challenges. Therefore,our study presents a sentiment analysis method based on themes clustering. Firstly, mining frequent itemsets from high quality micro-blog datasets, then setting the semantic correlation thresholds. Filtering out significant frequent itemsets and spectral clustering to get topic keywords. Grouping massive micro-blog data by semanteme based on topic keywords. And then combining sentiment lexicon, the value of micro-blog sentiment intensity was generated based on the sentiment words and negative words which were before or after the retrieved topic keywords of each category of micro-blog data within a specified distance in order to determine the category. Conducting experiment on million Chinese micro-blog,it proves that the method is accurate for getting topic and effective in sentimental classification on the topic.
作者 王灿伟
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2017年第3期549-556,共8页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金青年项目(71301086) 山东省电子政务项目(2150511) 山东省科技厅星火计划(2013XH17003) 教育厅科技计划(J14LN62)
关键词 海量微博 聚类 主题提取 情感分类 massive micro-blog, clustering, topic extraction, sentiment analysis
  • 相关文献

参考文献7

二级参考文献203

  • 1张珊,于留宝,胡长军.基于表情图片与情感词的中文微博情感分析[J].计算机科学,2012,39(S3):146-148. 被引量:55
  • 2姚天昉,聂青阳,李建超,李林琳,陈柯,付宁.一个用于汉语汽车评论的意见挖掘系统[C]//中文信息处理前沿进展-中国中文信息学会二十五周年学术会议论文集.北京:清华大学出版社,2006:260-281.
  • 3彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量:44
  • 4TURNEY P D. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL[ C ]//Proceedings of the 12th Eu- ropean Conference on Machine Learning. Berlin: Spring- er-Verlag, 2001:491-502.
  • 5姚天昉,娄德成.汉语语句主题语义倾向分析方法的研究[J].中文信息学报,2007,21(5):73-79. 被引量:78
  • 6Semiocast , Twitter reaches half a billion accounts more than 140 million in the U. S [EB/OL]. (2012-07-30)[2013-07- 23]. http://semiocast. com/publications/2012_07 _30_ Twitter_ reaches_halCa_billion_accounts_140m_in_the_ US.
  • 7Kwak H, Lee C, Park H, et al. What is Twitter, A social network or a news media [C] //Proc of the 19th Int Conf on World Wide Web (WWW·10). New York: ACM, 2010: 591-600.
  • 8Comscore. Mobile driving majority of growth for leading EU5 social networks [EB/OLJ. (2012-05-18) [2013-07- 23]. http://www.comscoredatamine.com/2012/05/mobile_ driving , majority _ f _ growth _ for _ leading _ eu5 _ social _ networks.
  • 9Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users: Real-time event detection by social sensors [C] //Proc of the 19th Int Conf on World Wide Web (WWW·10). New York: ACM, 2010: 851-860.
  • 10Popescu A M. Pennacchiotti M. Detecting controversial events from Twitter [C] !!Proc of the 19th ACM Int Conf on Information and Knowledge Management (CIKM·10). New York: ACM. 2010: 1873-1876.

共引文献295

同被引文献57

引证文献7

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部