期刊文献+

文本多分类中的特征选择研究 被引量:2

Feature Selection for Multi-Class Text Categorization
下载PDF
导出
摘要 特征选择是数据挖掘和机器学习等领域内重要的预处理步骤,近年来得到了广泛的关注。文本数据的高维性往往会影响分类等数据挖掘任务的效率,因此特征选择常被作为文本分类过程中的重要组成部分,以达到降维的目的。随着分类技术的快速发展,类别的日益细化,文本的多类分类问题为特征选择方法提出了更多的挑战。本文面向文本多类分类的应用背景,阐述了目前特征选择方法所面临的主要挑战,给出了多分类特征选择方法的主要种类。本文沿着相关研究的发展路线,由易至难,由浅入深,通过对目前多分类特征选择算法的应用情况进行总结,并进行综述评论,最后对全文进行了概括,提出了未来可能的研究方向。 As an important pre-processing step in data mining and machine learning, feature selection has been gradually developed. The high-dimensional characteristics of text data always declines the performance of categorization. Hence, feature selection can be employed as a dimension-reduction measure. With the fast evolution of classification methods and refinement of categories, muhi-class text categorization gives rise to more challenges for feature selection. In this paper, we present a survey of the main problems and the state of-art feature selection methods, following the development track. Finally, we conclude the whole paper and give some future directions of research.
出处 《计算机工程与科学》 CSCD 北大核心 2010年第8期90-93,148,共5页 Computer Engineering & Science
基金 国家863计划资助项目(2006AA01Z451 2007AA01Z474 2007AA010502) 国家自然科学基金资助项目(60873204) NCET060928
关键词 特征选择 文本分类 数据挖掘 层次结构 feature selection text categorization data mining hierarchical structure
  • 相关文献

参考文献41

  • 1Scbastiani F. Machine Learning in Automated Text Categorization[J]. ACM Computing Surveys, 2002,34 (1) : 1-47.
  • 2Chakrabarti S, Dora B, et al. Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases Into Hierarchical Topic Taxonomies [J]. VLDB Journal, 1998,7(3):163-178.
  • 3Forman G. An Experimental Study of Fcature Selection Metrics for Text Categorization[J].Journal of Machine Learning Research, 2003,3( 1 ) : 1289-1305.
  • 4Sebastiani F. Machine I.earning in Aulomated Text Categorization[J]. ACM Computing Surveys, 2002,34 ( 1 ) : 1-47.
  • 5Quinlan J R. Constructing Decision Tree,C4.5:Programs for Machine Learning[M]. New York: Morgan Kaufman Publishers, 1993.
  • 6Lewis D D. Feature Selection and Feature Extraction for Text Categorization[C]//Proc of Speech and Natural I.anguage Workshop, 1992:212-217.
  • 7Koller D, Sahami M. Hierarchically Classifying Documents Using Very Few Words[C]//Proc of the 14th Int'l Conf on Machine Learning ICML, 1997 : 170-178.
  • 8Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C]//Proc of the 16th Int'l Conf on Machine Learning, 1999:258-267.
  • 9Schutze H, Hull D A,Pedersen J O. A Comparison of Classifiers and Document Representations for the Routing Problem [C]//Proc of the ACM-SIGIR Int'l Conf on Research and Development in Information Retrieval, 1995 : 229-237.
  • 10Forman G. An Extensive Empirical Study of Feature Selection Metrics for Text Classification[J]. Special Issue on Variable and Feature Selection, 2003,3 ( 1 ) : 1289-1305.

同被引文献41

引证文献2

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部