摘要
对五种特征选择方法:文档频率、互信息、信息增益、期望交叉熵、统计作了简要的介绍,并且结合KNN分类算法,使用查全率、查准率、F1值对五种特征选择方法分别进行评估,提出并讨论了一种互信息修正的方法。
First this paper makes a brief introduction about DF, expected cross entropy ,MI, IG, andstatistic. Then combining with KNN classification algorithm, it assesses the four methods of feature selection by recall, precision, Fl. A% last, this paper proposes and discusses one method of improving MI.
出处
《科技广场》
2009年第7期35-37,共3页
Science Mosaic
关键词
文本分类
特征选择
互信息
Text Categorization
Feature Selection
MI