摘要
结合单词贡献度(TC)和列选择(CS),提出了一种综合的二阶段无监督特征选择方法。先利用TC方法快速去除对整体不具影响力的特征,再结合CS方法提出了一个对剩余特征选取特征子集的目标函数,并利用贪心和直推式实验设计的思想求解目标函数,最终获得精简特征子集。实验结果表明,所提出的方法在只选取很少量的特征时,聚类效果比已有的方法更好。
Supervised text feature selection has made many extensive applications,and unsupervised has also gradually been focused on.This paper presents a comprehensive unsupervised feature selection method of two-stage,combining term contribution(TC) and column selection(CS).The method first removes features of not influential in global performance quickly,and then combining Column Selection method presents an objective function for selecting a subset from the rest of features,which can be solved by using the ideas of greedy and transductive experimental design,and finally obtains the streamline feature subset.The experimental results show that our proposed algorithms can outperforms many state-of-the-art methods on text clustering.
关键词
无监督学习
特征选择
单词贡献度
列选择
文本聚类
unsupervised learning
feature selection
term contribution
column selection
text clustering