期刊文献+

基于相关性和冗余度的联合特征选择方法 被引量:15

Joint Feature Selection Method Based on Relevance and Redundancy
下载PDF
导出
摘要 比较研究了与类别信息无关的文档频率和与类别信息有关的信息增益、互信息和χ2统计特征选择方法,在此基础上分析了以往直接组合这两类特征选择方法的弊端,并提出基于相关性和冗余度的联合特征选择算法。该算法将文档频率方法分别与信息增益、互信息和χ2统计方法联合进行特征选择,旨在删除冗余特征,并保留有利于分类的特征,从而提高文本情感分类效果。实验结果表明,该联合特征选择方法具有较好的性能,并且能够有效降低特征维数。 Based on a comparative study of four feature selection methods,including document frequency(DF) unrelated to class information,and information gain(IG),mutual information(MI) and chi-square statistic(CHI),which are relatedto class information,we analyzed the disadvantages of combining these two kinds of methods directly and proposed a joint feature selection method based on relevance and redundancy to joint DF and one of IG,MI and CHI.This approach aims to eliminate redundant features,find useful features for classification and consequently improve the accuracy of text sentiment classification.The results of the experiment show that the proposed method can not only improve the performance but also reduce the feature dimension.
出处 《计算机科学》 CSCD 北大核心 2012年第4期181-184,共4页 Computer Science
基金 国家自然科学基金(60903225) 国防科技大学优秀研究生创新基金(S100502)资助
关键词 文本情感分类 联合特征选择 相关性 冗余特征 Text sentiment classification Joint feature selection Relevance Redundant feature
  • 相关文献

参考文献13

二级参考文献37

  • 1黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 2黄萱菁 吴立德.独立于语种的文本分类方法[C]..2000International Conference on Multilingual Information Processing[C].,2000..
  • 3YANG Yiming. An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval, 1997, 1 (1):69-90.
  • 4YANG Yiming, Xin Liu. A Re-examination of Text cAtegorization Methods[A]. In: Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. Berkeley: ACM Press, 1999. 42-49.
  • 5YANG Yiming. A Comparative Study on Feature Selection in Text Categorization[C]. The 14th International Conference on Machine Learning, Nashville, 1997.
  • 6MCCALLUM A, NIGAM K. A Comparison of Event Models for Naive Bayes Text Classification[C]. In AAAI-98 Workshop on Learning for Text Categorization, Madison, 1998.
  • 7Franco Salvetti, Stephen Lewis, Christoph Reichenbach. Automatic Opinion Polarity Classification of Movie Reviews[J]. Colorado Research in Linguistics, 2004, Volume 17, Issue 1.
  • 8Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[A]. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79 86.
  • 9Aidan Finn, Nicholas Kushmerick, and Barry Smyth. Genre classification and domain transfer for information filtering[A]. In: Fabio Crestani, Mark Girolami, and Cornelis J. van Rijsbergen, editors, Proceedings of ECIR-02, 24th European Colloquium on Information Retrieval Research, Glasgow, UK. Springer Verlag, Heidelberg, DE.
  • 10Janyce Wiebe, Rebecca Bruce, Matthew Bell, Melanie Martin, and Theresa Wilson. A corpus study of evaluative and speculative language[A]. In: Proceedings of the 2nd ACL SIGdial Workshop on Discourse and Dialogue, 2001.

共引文献375

同被引文献127

引证文献15

二级引证文献113

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部