期刊文献+

一种基于开方检验的特征选择方法 被引量:8

New Feature Selection Method Based on CHI
下载PDF
导出
摘要 开方检验是目前文本分类中一种常用的特征选择方法。该方法仅关注词语和类别间的关系,而没有考虑词与词之间的关联,因此选择出的特征集具有较大的冗余度。定义了词语的"剩余互信息"概念,提出了对开方检验的选择结果进行优化的方法。使用该方法可以得到既有很强表征性又有很高独立性的特征集。实验表明,该方法表现良好。 CHI is a widely used feature selection method in text classification. This method only focuses on the relevance between features and classifications but ignores the relevance between feature and feature, resulting in a high redundancy. This paper proposed a concept about residual mutual information, and then CHI and residual mutual information were combined together to optimized the selective results. The experimental results indicate that the method is effective.
出处 《计算机科学》 CSCD 北大核心 2015年第5期54-56,77,共4页 Computer Science
基金 教育部博士点基金资助项目(2010081110053)资助
关键词 文本分类 特征选择 开方检验 互信息 Text categorization, Feature selection, CHI, Mutual information
  • 相关文献

参考文献12

  • 1胡洁.高维数据特征降维研究综述[J].计算机应用研究,2008,25(9):2601-2606. 被引量:65
  • 2John H,Kohavi R,Pfleger K.Irrelevant feature and the subset selection problem[C]∥Proc.of the 11th Int.Conf.on Machine Learning,1994.San Francisco:Morgan Kaufmann Publishers,1994:121-129.
  • 3Jasper.文本分类入门(十)特征选择算法之开方检验[OL].2008-08-31[2014-01-18].http://www.blogjava.net/zhenanda-ci/archive/2008/08/31/225966.html.
  • 4Yu Lei,Liu Huan.Efficient Feature Selection via Analysis ofRelevance and Redundancy[J].Journal of Machine Learning Research,2004,10(5):1205-1224.
  • 5Battiti R.Using mutual information for selecting features in supervised neural net learning[J].IEEE Trans.Neural Network,1994,5(4):537-550.
  • 6Estevez P,Tesmer M,Perez C,et al.Normalized mutual information feature selection[J].IEEE Trans.Neural Network,2009,20(2):189-201.
  • 7Sun X,Liu Y,Xu M,et al.Feature selection using dynamicweights for classification[J].Knowledge-Based Systems,2013,37:541-549.
  • 8Li B,Chow T W S,Huang D.A novel feature selection methodand its application[J].Journal of Intelligent Information Systems,2013,41(2):235-268.
  • 9Lee S,Park Y T,d'Auriol B J.A novel feature selection method based on normalized mutual information[J].Applied Intelligence,2012,37(1):100-120.
  • 10Aliferis C F,Statnikov A,Tsamardinos I,et al.Local causal and markov blanket induction for causal discovery and feature selection for classification part i:Algorithms and empirical evaluation[J].The Journal of Machine Learning Research,2010,11:171-234.

二级参考文献36

  • 1SCHUTZE H, HULL D A, PEDERSEN J O. A comparison of classifiers and document representations for the routing problem[ C ]//Proc of the 18th ACM Int Conf on Research and Development in Information Retrieval. New York : ACM, 1995:229- 237.
  • 2CUTTING D R, KARGER D R, PEDERSON J O, et al. Scatter/gather:a cluster-based approach to browsing large document collections [ C ]//Proc of the 15th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York:ACM, 1992:315- 329.
  • 3SCHUTEZ H, SILVERSTEIN C. Projections for efficient document clustering[ C]//Proc of the 20th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1997,74-81.
  • 4DHILLON I S, MALLELA S, MODHA S. Information theoretic coclustering[C]//Proc of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York:ACM,2003:89-98.
  • 5PANTEL P, LIN D. Document clustering with committees [ C ]//Proc of the 25th Annual lnt ACM SIGIR Conf on Research and Development in Information Retrieval. 2002:199-206.
  • 6ZHA H, HE X, DING C, et al. Bipartite graph partitioning and data clustering[ C]//Proc of the 10th ACM Conf on Information and Knowledge Management. New York : ACM,2001:25- 32.
  • 7XU W, LIN X, GONG Y. Document clustering based on non-negative matrix factorization [ C ]//Proc of the 26th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York :ACM,2003:267- 273.
  • 8KONONERKO I. Estimating attributes: analysis and extension of relief[ C]//Proc of European Conf on Machine Learning. 1994: 171- 182.
  • 9SUN Yi-jun. Iterative relief for feature weighting: algorithms, theoties, and applications[J]. IEEE Trans on Pattern Analysis and Machine Intelligence,2007,29 ( 6 ) : 1035-1051.
  • 10NAKARIYAKUI S, CASASENT D P. Adaptive branch and bound algorithm for selecting optimal features [ J ]. Pattern Recognition Letters ,2007,28 ( 12 ) : 1415-1427.

共引文献64

同被引文献44

引证文献8

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部