摘要
特征选择已经广泛地应用在文本分类和文本聚类中,相对于无监督的特征选择方法,有监督的特征选择方法在过滤噪音等方面更为有效。但是,由于缺少类标签,它很难应用到文本聚类中。提出了一种针对W eb文本聚类的新的特征选择算法———基于k-m eans的多特征联合选择算法(MFCC)。MFCC充分利用了一个特征空间的中间聚类结果来帮助另一个特征空间进行特征选择。实验证明,MFCC有效地提高了聚类质量。
Feature selection has been widely applied in text categorization and clustering. Compared to unsupervised selection, supervised feature selection is more successful in filtering out noise in most cases. HOwever,due to a lack of label information, clustering can hardly exploit supervised selection. In this paper, We proposed a novel feature coselection for Web documents clustering, which is called Multitype Features Coselection for Clustering(MFCC). MFCC uses intermediate clustering results in one type of feature space to help the selection in other types of feature spaces. Our experiments show that for most selection criteria, MFCC reduces effectively the noise introduced by pesudoclass, and further improves clustering performance.
出处
《计算机应用与软件》
CSCD
北大核心
2007年第1期154-156,共3页
Computer Applications and Software