期刊文献+

一种用于Web文本聚类的特征选择方法 被引量:2

A FEATURE SELECTION ALGORITHM FOR WEB DOCUMENTS CLUSTERING
下载PDF
导出
摘要 特征选择已经广泛地应用在文本分类和文本聚类中,相对于无监督的特征选择方法,有监督的特征选择方法在过滤噪音等方面更为有效。但是,由于缺少类标签,它很难应用到文本聚类中。提出了一种针对W eb文本聚类的新的特征选择算法———基于k-m eans的多特征联合选择算法(MFCC)。MFCC充分利用了一个特征空间的中间聚类结果来帮助另一个特征空间进行特征选择。实验证明,MFCC有效地提高了聚类质量。 Feature selection has been widely applied in text categorization and clustering. Compared to unsupervised selection, supervised feature selection is more successful in filtering out noise in most cases. HOwever,due to a lack of label information, clustering can hardly exploit supervised selection. In this paper, We proposed a novel feature coselection for Web documents clustering, which is called Multitype Features Coselection for Clustering(MFCC). MFCC uses intermediate clustering results in one type of feature space to help the selection in other types of feature spaces. Our experiments show that for most selection criteria, MFCC reduces effectively the noise introduced by pesudoclass, and further improves clustering performance.
出处 《计算机应用与软件》 CSCD 北大核心 2007年第1期154-156,共3页 Computer Applications and Software
关键词 WEB挖掘 聚类 向量空间模型 Web mining Clustering VSM
  • 相关文献

参考文献1

二级参考文献1

共引文献125

同被引文献14

  • 1刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 2Kowalski G. Information Retrieval Systems : Theory and Implementation [ M ]. Kluwer Academic Publishers, 1997.
  • 3Zamir O, Etzioni O, Madani O, et al. Fast and Intuitive Clustering of Web Documents [C]// Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997: 287- 290.
  • 4Zeng H, He Q,Chen Z, et al. Learning to Cluster Web Search Results [ C ] / / Proceedings of the 2 7 thAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004, 210-217.
  • 5Koller D, Sahami M. Hierarchically Classifying Documents Using Very Few Words[C]//ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning, 1997:170-178.
  • 6Charu C Aggarwal, Philip S Yu. Finding Generalized Projected Clusters in High Dimensional Spaces[R]. The SIGMOD' 00, Dallas, A2000.
  • 7Yang Y, Pedersen I O. A Comparative Study on Feature Selection in Text Categorization[C]//Proc of International Conference on Machine Learning. San Francisco : Morgan Kaufmann Publishers, 1997 : 412- 420.
  • 8Liu T, Liu S P. An Evaluation on Feature Selection for Text Clustering [C]//Proc of International Conference on Machine Learning. San Francisco, Morgan Kaufmann Publishers, 2003: 53-58.
  • 9Wilbur J W, Sirotkin K. The Automatic Identification of Stop Words [J]. Journal of Information Science, 1992, 18(1), 45-55.
  • 10Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [ J ]. Journal of Machine Learning Research, 2003 (3) : 993-1022.

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部