期刊文献+

基于混合高斯分布伪样本生成的情感分析方法 被引量:1

An Approach to Sentiment Analysis of Chinese Microblogs Based on Gaussian Mixture Distribution Pseudo-sample Generation
下载PDF
导出
摘要 针对微博行文自由性大,情感倾向识别困难的问题,提出了一种基于混合高斯分布伪样本生成技术和条件随机场模型的新方法。该方法首先利用混合高斯分布模型来为训练集中的少数类生成伪样本从而构建一个情感倾向分布平衡的训练集,然后通过使用Word2vec来扩展微博句子以丰富它的情感信息,从而缓解情感词典不足够大对情感分类的负面影响;最后将条件随机场模型应用在上面已经平衡和扩展后的训练集上.实验结果表明该方法比现有方法在数据集情感倾向分布不平衡时能更有效地识别微博的情感倾向. Since informal words and expressions are widely used in miscroblogs , sentiment analysis of the microblogs is a difficult scientific problem , especially with the data in imbalanced sentiment distribution . GWCRF (Gaussian Mixture Distribution Word2vec CRF), a method based on pseudo-sample generation technique and Conditional Random Field ( CRF) for sentiment analysis of microblogs in imbalance distri-bution is presented .In the proposed method , firstly, the Gaussian Mixture Distribution is leveraged to generate pseudo-samples , which can increase the samples of minor classes for balancing the train data sets.Secondly, Word2vec technology is leveraged to enrich the microblog message and overcome the problem that sentiment lexicon is not large enough .Moveover , the CRF model is proposed to apply in the above balanced and extended train data sets .Experimental results on the microblog data demonstrate that this method outperforms the state-of-art methods in sentiment analysis of the microblog data sets with im-balanced sentiment distribution .
出处 《广东工业大学学报》 CAS 2016年第6期85-90,共6页 Journal of Guangdong University of Technology
基金 国家自然科学基金资助项目(61472089 61572143)
关键词 情感分析 混合高斯分布 条件随机场 情感倾向 不平衡性 Word2vec Word2 vec sentiment analysis Gaussian mixture distribution conditional random field sentiment im-balance Word2 vec
  • 相关文献

参考文献6

二级参考文献94

  • 1曾昭才.移动统计——移动网络分析的另一宝藏[J].移动通信,2005,29(11):93-95. 被引量:1
  • 2朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 3General Inquirer. http://wjh, harvard, edu/- inquirer.
  • 4Yao T F, Lou D C. Research on semantic orientation distinction for Chinese sentiment words [ C ]//The 7th International Conference on Chinese Computing. Wuhan, 2007.
  • 5Peter D Tumey. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [ C ]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguis- tics. 2002:417 -424.
  • 6Janyce Wiebe, Rebecca Brucey, Matthew Bell, et al. ACorpus Study of Evaluative and Speculative Language [ C ]//Proceedings of the Sec- ond SIGdial Workshop on Discourse and Dialogue. 2001:1 -10.
  • 7Ming Hu, Bin Liu. Mining and summarizing customer reviews [ C ]// Proceedings of the 10th international conference on Knowledge discov- ery and data mining (KDD). 2004 : 168 - 177.
  • 8Harabagiu S M, Bejan C A, Morarescu P. Shallow Semantics for Rela- tion Extraction[ C ]//Proceedings of the 19th International Joint Con- ference on Artificial Intelligence ( IJCAI-05 ). Edinburgh, Scotland: 2005,1061 - 1066.
  • 9王根,赵军.基于多重标记CRF的句子情感分析研究[C]//全国第九届计算语言学学术会议.清华大学出版社,2007.
  • 10谭松波.中文情感挖掘语料ChenSentiCorp[EB/OL](2010-06-29)[2011-04-22].http://www.searchforumrg.en/tan-songbo/corpus-senti.htm.

共引文献298

同被引文献4

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部