期刊文献+

中文网络评论的情感分类:句子与段落的比较研究 被引量:5

Sentiment Classification of Chinese Online Reviews:A Comparison between Sentences and Paragraphs
下载PDF
导出
摘要 针对句子和段落两种粒度的语料,采用机器统计学习方法,对可能影响中文网络评论情感分类效果的因素进行实验研究。选取N-gram作为情感文本的潜在特征项,利用文档频率、X2统计量以及期望交叉熵对特征项实施降维处理,采用布尔权重法构建特征向量,并采用SVM分类器进行网络评论的情感分类。研究发现,语料的粒度对分类准确率的影响较大,句子粒度和段落粒度的分类准确率约相差10%;特征降维方法对句子和段落的分类准确率都有一定影响,且分类效果各有优劣,因此应根据不同需要进行选择;Unigram、Bigram分类效果的优劣受到语料粒度和特征降维方法的影响,因此并非一成不变。 With sentences and paragraphs as samples, the effects of various factors on sentiment classification accuracy in Chinese online reviews are discussed. N-grams are selected as the potential sentimental features. The Document Frequency, Chi-square Statistic and Expected Cross Entropy methods are used to reduce feature dimensionality. The Boolean Weighting method is adopted to calculate feature weight and SVM classifier is adopted to classify online reviews. At last, experiments based on online reviews of sentences and paragraphs are conducted . The results showed that : the particle size strongly affect the classification performance of Chinese online reviews. Classification accuracy of sentences is higher than the classification accuracy of paragraphs. The dimension reduction methods also affect the classification performance, and each method has advantages and disadvantages. Therefore, the dimension reduction methods should be selected according to different circumstances. The classification performance of Unigram and Bigram is affected by particle size and the dimension reduction methods, so, it is variable.
出处 《情报学报》 CSSCI 北大核心 2013年第4期376-384,共9页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金资助项目(70971099) 中央高校基本科研业务费专项资金资助
关键词 网络评论 情感分类 句子 段落 online reviews, sentiment classification, sentences, paragraphs
  • 相关文献

参考文献18

二级参考文献128

共引文献321

同被引文献47

  • 1闫强,孟跃.在线评论的感知有用性影响因素——基于在线影评的实证研究[J].中国管理科学,2013,21(S1):126-131. 被引量:67
  • 2朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 3娄德成,姚天昉.汉语句子语义极性分析和观点抽取方法的研究[J].计算机应用,2006,26(11):2622-2625. 被引量:64
  • 4方予,陈增强,袁著祉.基于人工智能的情感模型建立[J].信息与控制,2006,35(6):673-678. 被引量:11
  • 5Agrawal R, Imielinski T, Swami A. Mining Association Rules be-tween Sets of Items in Large Databases // Proc of the ACM SIGMODInternational Conference on Management of Data. Washington,USA, 1993: 207-216.
  • 6Liu B, Hu M Q, Cheng J S. Opinion Observer: Analyzing andComparing Opinions on the Web // Proc of the 14th InternationalConference on World Wide Web. Chiba, Japan, 2005 : 342-351.
  • 7Hu M Q, Liu B. Mining and Summarizing Customer Reviews //Proc of the 10th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. Seattle, USA, 2004 : 168-177.
  • 8Kim S M, Hovy E. Determining the Sentiment of Opinions // Procof the 20th International Conference on Computational Linguistics.Geneva, Switzerland, 2004 : 1367-1373.
  • 9刘群,李素建.基于知网的词汇语义相似度的计算//第三届汉语词汇语义学研讨会论文集.台北,2002: 59-76.
  • 10Kim S M, Hovy E. Automatic Identification of Pro and Con Rea-sons in Online Reviews // Proc of the 21 st International Conferenceon Computational Linguistics and 44th Annual Meeting of the Asso-ciation for Computational Linguistics. Sydney, Australia, 2006 ;483-490.

引证文献5

二级引证文献167

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部