摘要
从海量微博数据中分析公众对某一社会事件的情感倾向具有重要研究意义,而海量微博文本稀疏规模庞大,导致传统方法处理这一任务时面临诸多挑战.提出一种基于主题聚类的海量微博情感分析方法.首先基于高质量微博数据挖掘频繁项集,设定语义相关阈值,筛选重要频繁项集进行谱聚类,得到主题关键词.基于主题关键词对海量微博数据依据语义相关度归类,最后结合情感词典对每类中的微博检索主题关键词前后修饰距离内情感词及否定词,结合表情符号计算微博情感值.在百万规模中文微博上进行实验,证明该方法能准确按主题归类且能有效在该主题上进行情感分类.
It is of great significance to analyze public sentimental tendency for a social event from massive micro-blog data of social network. Massive micro-blog data features sparse, large scale, and so on, so traditional methods of handling this task face many challenges. Therefore,our study presents a sentiment analysis method based on themes clustering. Firstly, mining frequent itemsets from high quality micro-blog datasets, then setting the semantic correlation thresholds. Filtering out significant frequent itemsets and spectral clustering to get topic keywords. Grouping massive micro-blog data by semanteme based on topic keywords. And then combining sentiment lexicon, the value of micro-blog sentiment intensity was generated based on the sentiment words and negative words which were before or after the retrieved topic keywords of each category of micro-blog data within a specified distance in order to determine the category. Conducting experiment on million Chinese micro-blog,it proves that the method is accurate for getting topic and effective in sentimental classification on the topic.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2017年第3期549-556,共8页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金青年项目(71301086)
山东省电子政务项目(2150511)
山东省科技厅星火计划(2013XH17003)
教育厅科技计划(J14LN62)
关键词
海量微博
聚类
主题提取
情感分类
massive micro-blog, clustering, topic extraction, sentiment analysis