期刊文献+

一种高效的用于文本聚类的无监督特征选择算法 被引量:37

An Effective Unsupervised Feature Selection Method for Text Clustering
下载PDF
导出
摘要 特征选择虽然非常成功地应用于文本分类,但却很少用于文本聚类,这是因为那些高效的特征选择方法通常都是有监督的特征选择算法,它们因为需要类信息而无法直接应用于文本聚类.为了能将这些方法应用到文本聚类上,提出了一种新的无监督特征选择算法:基于K-Means的特征选择算法(KFS).这个算法通过在不同K-Means聚类结果上使用有监督特征选择的方法,成功地选择出了最为重要的一小部分特征,使文本聚类的性能提高了近15%. Feature selection has been successfully applied to text categorization, but rarely applied to text clustering, because those effective supervised feature selection methods can't be applied to text clustering due to the unavailability of class label information. So a new feature selection method called 'K-Means based feature selection (KFS)' method is proposed in this paper, which addresses the unavailability of label information by performing effective supervised feature selections on different K-Means clustering results. Experimental results show that (1) KFS successfully selects out the best small part of features and significantly improves the clustering performance; and (2) Compared with other feature selection methods, KFS is very close to the ideal supervised feature selection methods and much better than any unsupervised methods.
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第3期381-386,共6页 Journal of Computer Research and Development
关键词 特征选择 文本聚类 feature selection text clustering
  • 相关文献

参考文献7

  • 1陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 2C. C. Aggrawal, P. S. Yu. Finding generalized projected clustersin high dimensional spaces. The SIGMOD'00, Dallas, 2000.
  • 3M. Dash, H. Liu. Feature selection for clustering. The PAKDD-00, Kyoto, 2000.
  • 4F. Sebastiani. Machine learning in automated text categorization.ACM Computin Surveys, 2002, 34(1): 1--47.
  • 5Y. Yang, J. O. Pedersen. A comparative study on featureselection in text categorization. The ICML97, Nashville, 1997.
  • 6M. Rogati, Y. Yang. High performance feature selection for text categorization. The CIKM-02, Mclean, 2002.
  • 7L. Tao, L. Shengping, C. Zheng, et al.An evaluation on feature selection for text clustering. The ICML03, Washington,2003.

二级参考文献1

共引文献125

同被引文献326

引证文献37

二级引证文献384

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部