期刊文献+

一种综合的二阶段无监督特征选择方法

A comprehensive unsupervised feature selection method of two-stage strategy
下载PDF
导出
摘要 结合单词贡献度(TC)和列选择(CS),提出了一种综合的二阶段无监督特征选择方法。先利用TC方法快速去除对整体不具影响力的特征,再结合CS方法提出了一个对剩余特征选取特征子集的目标函数,并利用贪心和直推式实验设计的思想求解目标函数,最终获得精简特征子集。实验结果表明,所提出的方法在只选取很少量的特征时,聚类效果比已有的方法更好。 Supervised text feature selection has made many extensive applications,and unsupervised has also gradually been focused on.This paper presents a comprehensive unsupervised feature selection method of two-stage,combining term contribution(TC) and column selection(CS).The method first removes features of not influential in global performance quickly,and then combining Column Selection method presents an objective function for selecting a subset from the rest of features,which can be solved by using the ideas of greedy and transductive experimental design,and finally obtains the streamline feature subset.The experimental results show that our proposed algorithms can outperforms many state-of-the-art methods on text clustering.
作者 吕靖 童若锋
出处 《中国科技论文在线》 CAS 2011年第4期268-272,279,共6页
关键词 无监督学习 特征选择 单词贡献度 列选择 文本聚类 unsupervised learning feature selection term contribution column selection text clustering
  • 相关文献

参考文献2

二级参考文献12

  • 1刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 2Yang Yiming,Pedersen J O.A comparative study on feature selection in text categorization[C]//Proc of the 14th International Conference on Machine Learning ICML97,1997:412-420.
  • 3Karypis G,Han E.Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval[C]// Proc of the 9th ACM International Conference on Information and Knowledge Management CIKM-00.New York,US:ACM Press,2000: 228-233.
  • 4Baker L D,McCallum A K.Distributional clustering of words for text classification[C]//Proc of the 21st Annual International ACM SIGIR, 1998 :96-103.
  • 5谭松波语料库[DB/OL].http://lcc.software.ict.ac.cn/-tansongbo/corpusl.php.
  • 6Jolliffe I T.Principal component analysis[M].New York:Spriger Verlag, 1986.
  • 7Martinez A M,Kak A C.PCA versus LDA[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(2):228-233.
  • 8AGGRAWAL C C,Yu P S.Finding generalized projected clusters in hish dimensional spaces[J].ACM SIGMOD Record,2000,29(2):70-81.
  • 9YANG Y,PEDERSEN Y O.A comparative study on feature selection in text categorization[C]//Proc.of ICML'97.San Francisco.CA,USA:Morgan Kaufmann Publishers Inc.,1997:412-420.
  • 10WILBUR WJ,YANG Y.An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts[J].Comput Biol Med.1996,26(3):209-22.

共引文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部