期刊文献+

基于特征贡献度的特征选择方法在文本分类中应用 被引量:9

Application of feature selection method to text categorization based on feature contribution degree
下载PDF
导出
摘要 在目前的文本分类问题中,特征选择方法被认为是提高分类精度和效率的一种有效方法.提出了一种基于特征贡献度FCD(feature contribution degree)的特征选择方法,本方法将某个特征对于类别之间区分能力的贡献度大小作为该特征被选取的条件,特征对于某一类别的FCD值为特征在该类中出现的文档数与在所有类别中出现的文档数的比值.对该方法进行了实验,并与一些常用的特征选择方法进行了比较,实验结果表明该方法具有更好的分类效果. At present,the feature selection method is viewed as an efficient method for improving the accuracy and efficiency of classification in text categorization.A feature selection method based on feature contribution degree(FCD) is proposed.In this method,a feature will be selected according to the contribution degree for differentiating a certain category from others.The FCD of a feature in a particular category is a ratio between the number of documents occurring in a certain category and the number of documents occurring in all categories.The experiments corresponding to this method are carried out and the comparison between this method and other common feature selection method is presented.As a result,the method proposed outperforms other feature selection methods.
出处 《大连理工大学学报》 EI CAS CSCD 北大核心 2011年第4期611-615,共5页 Journal of Dalian University of Technology
基金 国家自然科学基金资助项目(60973068 61002039) 高等学校博士学科点专项科研基金资助项目(20090041110002) 中央高校自主科研基金资助项目(DC10040118)
关键词 文本分类 特征选择 向量空间模型 特征贡献度 text categorization feature selection vector space model feature contribution degree
  • 相关文献

参考文献14

  • 1MITEHELL T. Machine Learning [M]. New York: McGraw-Hill, 1997.
  • 2MCCALLUM A, NIGAM K. A comparison of event models for Naive Bayes text classification [C] // Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. Wisconsin: AAAI Press, 1998.
  • 3COVER T M, HART P E. Nearest neighbor pattern classification [J]. IEEE Transactions on Information Theory, 1967, 13(1):21-27.
  • 4ADWAIT R. Maximum entropy models for natural language ambiguity resolution [D]. Pennsylvania: University of Pennsylvania, 1998.
  • 5NG Hwee-tou, GOH Wei-boon, LOW Kok-leong. Feature selection, perceptron learning, and a usability case study for text categorization [C] //Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press 1997.
  • 6VAPNIK V. The Nature of Statistical Leaning Theory [M]. New York:Springer-Verlag, 1995.
  • 7YANG Y, PEDERSEN J. A comparative study on feature selection in text categorization [ C] // Proceedings of the 14~ International Conference onMachine Learning (ICML' 97). Nashville: Morgan Kaufmann Publishers, 1997.
  • 8MLADENIC D, GROBELNIK M. Features selection for unbalanced class distribution and Naive Bayes [C] // Proceedings of the 16th International Conference on Machine Learning. Slovenia: Morgan Kaufmann Publishers, 1999.
  • 9FORMAN G. An extensive empirical study of feature selection metrics for text classification [J]. Journal of Machine Learning Research, 2003, 3(7-8):1289-1305.
  • 10徐燕,李锦涛,王斌,孙春明.基于区分类别能力的高性能特征选择方法[J].软件学报,2008(1):82-89. 被引量:83

二级参考文献3

共引文献82

同被引文献90

引证文献9

二级引证文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部