期刊文献+

文本分类中特征选择方法研究及分析

Study and Analyze on Feature Selection in Text Categorization
下载PDF
导出
摘要 对五种特征选择方法:文档频率、互信息、信息增益、期望交叉熵、统计作了简要的介绍,并且结合KNN分类算法,使用查全率、查准率、F1值对五种特征选择方法分别进行评估,提出并讨论了一种互信息修正的方法。 First this paper makes a brief introduction about DF, expected cross entropy ,MI, IG, andstatistic. Then combining with KNN classification algorithm, it assesses the four methods of feature selection by recall, precision, Fl. A% last, this paper proposes and discusses one method of improving MI.
作者 洪亮
机构地区 江西省儿童医院
出处 《科技广场》 2009年第7期35-37,共3页 Science Mosaic
关键词 文本分类 特征选择 互信息 Text Categorization Feature Selection MI
  • 相关文献

参考文献2

二级参考文献6

  • 1Yang Yiming,Pedersen J O.A comparative study on feature selection in text categorization[C]//Proc of the 14th International Conference on Machine Learning ICML97,1997:412-420.
  • 2Karypis G,Han E.Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval[C]// Proc of the 9th ACM International Conference on Information and Knowledge Management CIKM-00.New York,US:ACM Press,2000: 228-233.
  • 3Baker L D,McCallum A K.Distributional clustering of words for text classification[C]//Proc of the 21st Annual International ACM SIGIR, 1998 :96-103.
  • 4谭松波语料库[DB/OL].http://lcc.software.ict.ac.cn/-tansongbo/corpusl.php.
  • 5Jolliffe I T.Principal component analysis[M].New York:Spriger Verlag, 1986.
  • 6Martinez A M,Kak A C.PCA versus LDA[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(2):228-233.

共引文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部