期刊文献+

在线评论情感分析中固定搭配特征提取方法研究 被引量:26

Regular Collocation Features Extraction Method in Online Reviews Sentiment Analysis
下载PDF
导出
摘要 有效和稳定的特征提取和特征表示是提高在线评论情感分析性能的重要因素。在常规的连续词袋性、触发对等特征的基础上,本文研究在线评论中固定搭配特征的提取与表示方法,提出结合互信息和平均互信息、基于粗糙集两种策略用于固定搭配特征提取,并从特征抽取方法的有效性和稳定性分析出发考查所抽取的固定搭配其内部及外部稳定性,并将经筛选的固定搭配特征融合于多种情感分析模型中进行情感分析。真实酒店评论数据上的实验表明,固定搭配特征的恰当表示和筛选有效改善情感分析模型的分类精度,此外研究发现评论中情感特征词分布不均衡情况下采用可变精度粗规则的提取策略有助于提高情感分析的分类精度。 Precise sentiment orientation classification models and the extraction of effective and stable features from the review context are two essential factors which can affect the pedormance of online review sentiment analysis.Among various complicated features due to language complexity,regular collocation features are found to play important roles in that their structured expressions and show great impact on the sentiment orientation aside from conventional word bag and trigger pair features.In order to extract the complicated features for online reviews sentiment analysis,two novel approaches are presented in this paper to capture effectively the regular collocation features from the review of corpora-mutual information and average mutual information combined.Regular collocation features extracted are incorporated into sentiment analysis models as inputs to implementing the review sentiment analysis.The experiment on real hotel online reviews achieve generally higher precision,improves the performance of SVM models by 0.34% and that of the Na'fve Bayes models by 1.27%,respectively.As for the extraction of regular collocation features,two aspects were considered as essential to expressing effectively the complicated constraint of the review sentiment orientation from (1) internal stability of the regular collocation structure,which accounts for the substantial existence of the regular collocation aside from traditional word bags or trigger pairs,and (2) external effectiveness of the regular collocations which accounts for the contribution to the sentiment orientation classification.The mutual information method used in this paper measures external effectiveness while the average mutual information computation and its filtering performs the measurement of internal stability of regular collocations.The rough set based method ensures the internal stability and external effectiveness by α approximation rough rule extraction strategy and a maximum likelihood estimate of the regular collocations distribution.On the implementation,the approach presented has the non-uniform distribution occurrence of the sentiment features within the review.Variable precision strategies on the rough sets approach was introduced instead of the original rough rule strategy.It was found in the experiments that variable precision strategies on the rough sets approach did achieve the best sentiment analysis performance 88.38% via SVM models by the threshold value 0.85.Those results show that in dealing with the online review with non-uniform distribution occurrence of sentiment features.The variable precision strategy avoids the true voice of the minority and helps discriminate the whole sentiment orientation of the review.When dealing with the online review with uniform distribution occurrence of the sentiment features,α approximation would be a better choice to replace the original maximum likelihood estimate in the pursuit of a better sentiment analysis.A combination of mutual information and average mutual information approach would also be an optional strategy in the pursuit of comparative performance but with less computation under the same condition.
出处 《管理工程学报》 CSSCI 北大核心 2014年第4期180-186,共7页 Journal of Industrial Engineering and Engineering Management
基金 国家自然科学基金资助项目(71202168 71271066) 中央高校基本科研业务费专项资金资助项目(HIT.NSRIF2010083) 黑龙江省教育厅科学技术研究资助项目(12511435)
关键词 情感分析 固定搭配特征提取 互信息与平均互信息 粗糙集 支持向量机 sentiment analysis regular collocation features extraction mutual information and average mutual information rough sets support vector machine
  • 相关文献

参考文献17

  • 1张紫琼,叶强,李一军.互联网商品评论情感分析研究综述[J].管理科学学报,2010,13(6):84-96. 被引量:154
  • 2Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques[ C]// ACL-02 conference on Empirical methods in natural language processing, 1118704, Association for Computational Linguistics, 2002 : 79 - 86.
  • 3陶富民,高军,王腾蛟,周凯.面向话题的新闻评论的情感特征选取[J].中文信息学报,2010,24(3):37-43. 被引量:16
  • 4Bo Pang, Lillian Lee. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales [ C ]// Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Michigan, 1219855,Association for Computational Linguistics, 2005 : 115 - 124.
  • 5Nitin Jindal, Bing Liu. Identifying comparative sentences in text documents[ C ] //Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, 1148215, ACM, 2006 : 244 - 251.
  • 6Peter D. Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews [ C ]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, 1073153, Association for Computational Linguistics, 2002 : 417 - 424.
  • 7Michael Gamon, Anthony Aue. Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms [ C ]//Proeeedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor, Michigan, 1610241, Association for Computational Linguistics, 2005 : 57 - 64.
  • 8Yan Zhao, Xiao-Long Wang, Bing-Quan Liu, Yi Guan. Applying class triggers in Chinese POS tagging based on maximum entropy model [ C ]//Proceedings of 2004 International Conference on Machine Learning and Cybernetics, 2004:1641 -1645.
  • 9杨超,冯时,王大玲,杨楠,于戈.基于情感词典扩展技术的网络舆情倾向性分析[J].小型微型计算机系统,2010,31(4):691-695. 被引量:68
  • 10姚天昉,程希文,徐飞玉,汉思·乌思克尔特,王睿.文本意见挖掘综述[J].中文信息学报,2008,22(3):71-80. 被引量:106

二级参考文献85

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 2王珏,苗夺谦,周育健.关于Rough Set理论与应用的综述[J].模式识别与人工智能,1996,9(4):337-344. 被引量:264
  • 3娄德成,姚天昉.汉语句子语义极性分析和观点抽取方法的研究[J].计算机应用,2006,26(11):2622-2625. 被引量:64
  • 4徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量:122
  • 5黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 6Wentian Li.Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution[J].IEEE Transactions on Information Theory 38 1992,6:1842-1845.
  • 7J.Liu,Y.Cao,C.Y.Lin and et al.Low-quality product review detection in opinion summarization[C]//Proc.Of EMNLP-CoNLL,2007:334-342.
  • 8S.M.Kim,P.Pantel,T.Chklovski and M.Pennacchiotti.Automatically assessing review helpfulness[C]//Proc.of EMNLP,2006:423-430.
  • 9Yiming Yang and Jan O.Pedersen.A Comparative Study on Feature Selection in Text Categorization[C]//Proc.of ICML,1997:412-420.
  • 10G.Forman.An extensive empirical study of feature selection metrics for text classification[J].Journal of Machine Learning Research,2003,3:1289-1305.

共引文献842

同被引文献314

引证文献26

二级引证文献384

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部