期刊文献+

文本分类中的特征降维方法综述 被引量:79

Literature Review of Feature Dimension Reduction in Text Categorization
下载PDF
导出
摘要 文本分类的关键是对高维的特征集进行降维.降维的主要方法是特征选择和特征提取.本文综述了已有的特征选择和特征抽取方法,评价了它们的优缺点和适用范围. The key to text categorization is how to reduce the high-dimension of the feature vectors. Feature reduction method involves feature selection and feature extraction. In this paper feature selection methods and feature extraction methods are colligated. Their advantage and disadvantage are evaluated.
作者 陈涛 谢阳群
出处 《情报学报》 CSSCI 北大核心 2005年第6期690-695,共6页 Journal of the China Society for Scientific and Technical Information
基金 浙江省教育厅资助项目
关键词 文本分类 特征降维 特征选择 特征提取 text categorization, feature reduction, feature selection, feature extraction.
  • 相关文献

参考文献25

  • 1David D Lewis. An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of 15^th ACM International Conference on Research and Development in Information Retrieval (SIGIR-92), 1992. 37-50.
  • 2Fuhr N, and Buchley C. A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 1991, 9(3) :223 - 248.
  • 3Dumais S T, Platt J, Heckerman D, et al. Inductive learning algorithms and representations for text categorization. Technical Report, Microsoft Research, 1998.
  • 4Joachims T. A probabilitic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14^th International Conference on Machine Learning (ICML-97).1997.
  • 5Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34( 1 ) : 1 - 47.
  • 6David W Aha, and Richard L Bankert. A comparative evaluation of sequential feature selection algorithms. In..Proceedings of the 5^th International Workshop on Artificial Intelligence and Statistics, 1995 : 1-7.
  • 7Ron Kohavi, and George H John. Wrappers for feature subset selection. Artificial Intelligence Journal. Special Issue on Relevance, 1997 : 273 - 324.
  • 8Tao Liu, Shengping Liu, Zheng Chen, et al. An evaluation on feature selection for text clustering. In- Proceedings of the 20^th International Conference on Machine Learning (ICML-03),2003. 488 - 495.
  • 9Lei Yu, and Huan Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings ofthe 20^th International Conference on Machine Learning (ICML-03), 2003. 856 -863.
  • 10L Douglas Baker, and Andrew Kachites McCallum.Distributional clustering of words for text classification. In:Proceedings of the 21^st ACM International Conference on Research and Development in Information Retrieval (SIGIR-98), 1998. 96 - 103.

二级参考文献21

  • 1[1]DUMAIS S T, FURNAS G W, LANDAUER T K,et al. Using latent semantic analysis to improve information retrieval [ A ]. In: Proceedings of CHI'88: Conference on Human Factors in Computing[C]. New York: ACM, 1988.
  • 2[2]ANDO R K. The document representation problem:an analysis of LSI and iterative residual rescaling[EB/OL]. http://www. cs. cornell. edu/home/llee/extra/rie-kubota-ando-thesis. pdf-2001-11-01.
  • 3[3]LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix factorization [J]. Nature,1999, 401(6755): 788-791.
  • 4[4]LEE D D, SEUNG H S. Algorithms for non-negative matrix faetorization [EB/OL]. http: //hebb. mit.edu/people/seung/papers/nmfconverge. pdf-2001-12-01.
  • 5[5]KOLDA T G, O'LEARY D P. A semi-discrete matrix decomposition for latent semantic indexing in information retrieval [EB/OL]. http: // citeseer. nj.nec. com/126820. html-2001-11-01.
  • 6LAM W, RUIZ M, SRINIVASAN P. Automatic text categorization and application to text retrieval [J].IEEE Trans on Knowledge and Data Engineering, 1999,11(6) :865-879.
  • 7CHEN M Y, HAN J W, YU S Y. Data mining an overview from a database perspective [J]. IEEE Trans Knowledge and Data Engineering, 1996, 18(6) :866-883.
  • 8USAMA M F. Data mining and knowledge discovery : making sense out of data [J]. IEEE Expert, 1996, 11 (5):20-25.
  • 9APTE C ,DAMERA F ,WEISS S M. Automated learning of decision rules for text categorization[J]. ACM Trans Information System, 1994,12(3): 233-251.
  • 10Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.

共引文献239

同被引文献841

引证文献79

二级引证文献335

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部