期刊文献+

面向中文短文本情感分析的改进特征选择算法 被引量:4

Improved Feature Selection Algorithm for Chinese Short Text Sentiment Analysis
下载PDF
导出
摘要 在中文短文本情感分析的特征提取中,词频逆文本频率指数算法TF-IDF存在特征词分布计算片面性的缺陷,信息增益算法IG不能很好地提取短文本特征,为此,提出了一种改进特征选择算法ITFIDF-IG。根据短文本语料特点提高更具分类效果的特征词权重,降低了无关词的干扰,并考虑特征词在分布上体现的分类效果,有效提取出更具分类贡献度的特征词,更适应中文短文本的情感分析,取得较好的分类性能。 In view of the shortcomings of the term frequency-inverse document frequency(TF-IDF) method for feature word distribution,the declining of information gain(IG) algorithm accuracy due to feature sparseness,as well as the drawback in computation because of the imbalanced distribution of text corpus.A sentiment analysis algorithm ITFIDF-IG based on the improved feature selection algorithm is proposed,which improves the weights of features according to their contributions to the classification implementation.By applying the proposed method into sentiment analysis of Chinese short text,it can effectively improve the contributions of features for classification,and reduce the interference from different numbers of texts among sets.The method is more suitable for Chinese short text sentiment analysis with better classification performance.
作者 王荣波 沈卓奇 黄孝喜 谌志群 WANG Rongbo;SHEN Zhuoqi;HUANG Xiaoxi;CHEN Zhiqun(Institute of Cognitive and Intelligent Computing,Hangzhou Dianzi University,Hangzhou Zhejiang 310018,China)
出处 《杭州电子科技大学学报(自然科学版)》 2019年第1期45-50,共6页 Journal of Hangzhou Dianzi University:Natural Sciences
基金 教育部人文社科规划青年基金资助项目(12YJCZH201) 教育部人文社会科学研究规划基金资助项目(18YJA740016)
关键词 特征选择 情感分析 词频逆文本频率指数 信息增益 中文短文本 feature selection sentiment analysis term frequency-inverse document frequency information gain Chinese short text
  • 相关文献

参考文献7

二级参考文献48

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 2李文斌,刘椿年,陈嶷瑛.基于特征信息增益权重的文本分类算法[J].北京工业大学学报,2006,32(5):456-460. 被引量:19
  • 3柴玉梅,王宇.基于TFIDF的文本特征选择方法[J].微计算机信息,2006,22(08X):24-26. 被引量:32
  • 4宗成庆.统计自然语言处理[M].北京:清华大学出版社,2011.
  • 5NG H T, GOH W B, LOW K L. Feature selection, perception learning and a usability case study for text categorization [C] //Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval. New York: ACM Press,1997: 67-73.
  • 6YANG Y, PEDERSEN J O. A comparative study on feature selec tion in text categorization [C] // ICML 1997: Proceedings of the 14th International Conference on Machine Learning, New York: ACM Press,1997:412-420.
  • 7MLADENIC D,GROBELNK M. Feature selection for unbalanced class distribution and naive Bayes[C] // ICML 1999: Proceedings of the Sixteenth International Conference on Machine Learning. New York: ACM Press, 1999: 258-267.
  • 8XU Y, CHEN L. Term-frequency based feature selection methods for text categorization [C] // Proceedings of the 2010 Fourth International Conference on Genetic and Evolutionary Computing. Piscataway: IEEE Press, 2010:280-283.
  • 9HU Q, YU D, XIE Z. Neighborhood classifiers[J]. Expert Systems with Applications, 2008,34(2):866-876.
  • 10Bong Ch,K.Narayanan.An empirical study of feature selection for text categorization based on term weightage[C]//Proceedings of the International Conference on Web Intelligence,2004:599-602.

共引文献79

同被引文献31

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部