期刊文献+

一种基于信息增益的特征选择方法 被引量:11

BASED ON THE INFORMATION GAIN TEXT FEATURE SELECTION METHOD
下载PDF
导出
摘要 本文提出了一种基于信息增益改进的信息增益特征选择选择方法。首先对数据集按类进行特征选择,减少数据集不平衡性对特征选取的影响。其次运用特征出现概率计算信息增益权值,降低低频词对特征选择的干扰。最后使用离散度分析特征在每类中的信息增益值,过滤掉高频词中的相对冗余特征,并对选取的特征应用信息增益差值做进一步细化,获取均匀精确的特征子集。通过对照不同算法的测评函数值,表明本文选取的特征子集具有更好的分类能力。 In this paper, based on information gain improved information gain feature selection in text. First class feature selection data set, reducing the imbalance of the data sets feature selection. Followed by the use of the characteristics of the calculated probability of occurrence information gain we reduce the low - frequency words feature selection interference. The final dispersion analysis feature information gain value in each category, to fil- ter out h - frequency words the relatively redundant features, and select the characteristics of the application of information gain the difference further refinement, to obtain uniform and accurate feature subset. Control algo- rithm evaluation function value, indicating that the paper selected feature subset has better classification ability.
作者 黄志艳
出处 《山东农业大学学报(自然科学版)》 CSCD 北大核心 2013年第2期252-256,共5页 Journal of Shandong Agricultural University:Natural Science Edition
关键词 特征选择 文本分类 信息增益 Feature selection text classification information gain
  • 相关文献

参考文献6

二级参考文献27

  • 1唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量:26
  • 2徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 3李文斌,刘椿年,陈嶷瑛.基于特征信息增益权重的文本分类算法[J].北京工业大学学报,2006,32(5):456-460. 被引量:19
  • 4YANG Yiming. A comparative study on feature selection in text categorization[ C/OL]// Proceedings of the Fourteenth International Conference on Machine Learning, 1997: 412-420. [2009-04-20], http://www. cs. cmu. edu/- yiming/papers. yy/icm197. ps. gz.
  • 5Ng H T,Goh W B,Low K L.Feature selection,perceptron learning and a usability case study for text categorization[C]//Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval(SIGIR-97).1997:67-73.
  • 6Mladenic D,Grobelnk M.Feature selection for unbalanced class distribution and naive Bayes[C]//Proceedings of the 16th Int'1 Conf.on Machine Learning(ICML'99).San Francisco:Morgan Kaufmann Publishers,1999:258-267.
  • 7Yang Y,Pedersen J P.A comparative study on feature selection in text categorization[C]//Proceedings of the 14th Int'1 Conference on Machine Learning(ICML'97).1997:412-420.
  • 8Li H F,Jiang T,Zhang K S.Efficient and robust feature extraction by maximum margin criterion[C]//Proceedings of the Advances in Neural Information Processing Systems.Vancouver,Canada:MIT,2003:97-104.
  • 9Mitchell T.机器学习[M].曾华军,等译.北京:机械工业出版社,2007.
  • 10Lewis D D,Ringuette M.A comparison of two learning algorithms for text categorization[C]//Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval.Las Vegas,USA,1994:81-93.

共引文献88

同被引文献79

引证文献11

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部