期刊文献+

基于DF算法改进的文本聚类特征选择算法 被引量:6

Improved Feature Selection Algorithm based on DF Algorithm for Text Clustering
下载PDF
导出
摘要 通过研究文本特征选取中权重的计算问题,提出了一种利用特征词的熵函数加权的权值的计算方法,不但考察了特征词的文档频数,而且考察了它们在文档中出现的次数,使选出的特征子集更具有较好的代表性.实验表明,改进后的算法对聚类结果有了一定的改进. By studying the text feature selection in the weight calculation problem,a calculation method of the word entropy weighted was proposed.Not only examines the characteristics of the document frequency,but also examines them in a document the number of occurrences.This selected feature subset is more good representation.Experiments show that the improved algorithm for clustering results have certain improvements.
出处 《甘肃联合大学学报(自然科学版)》 2012年第1期51-54,共4页 Journal of Gansu Lianhe University :Natural Sciences
关键词 特征选择 文档频 词频 feature selection document frequency word frequency
  • 相关文献

参考文献7

二级参考文献17

  • 1贺贤明,戴坚峰.一种新型文本自动分类系统的研究与实现[J].微电子学与计算机,2004,21(10):23-26. 被引量:6
  • 2谭金波,黄峰,杨晓江,李艺.一种改进的互信息特征选择算法[J].情报学报,2006,25(6):651-656. 被引量:7
  • 3Fodor I K.A Survey of Dimension Reduction Techniques[R].LLNL technical report,UCRL-ID-148494,http://www.llnl.gov/CASC/sapphire/pubs.html,2002
  • 4J Lin,D Gunopulos.Dimensionality Reduction by Random Projection and Latent Semantic Indexing[C].In:Text Mining Workshop,at the
  • 5rd SIAM International Conference on Data Mining,20033.Kaski S.Dimensionality Reduction by Random Mapping:Fast Similarity Computation for Clustering[C].In:Proceedings of International Joint Conference on Neural Networks(IJCNN'98),IEEE Service Center,Piscataway,NJ,1998:413~418
  • 6Bingham E,Mannila H.Random Projection in Dimensionality Reduction:Applications to Image and Text Data[C].In:Proc SIGKDD(2001),2001:245~250
  • 7Lee D,Seung H.Algorithms for Non-negative Matrix Factorization[C].In:Adv Neural Info Proc Syst,2001 ;13:556~562
  • 8Lee D,Seung H.Learning the Parts of Objects by Nonnegative Matrix Factorization[J].Nature,1999;401 (21):788~791
  • 9George Karypis,Eui-Hong(Sam) Han.Concept Indexing:A Fast Dimensionality Reduction Algorithm with Applications to Document Retrieval & Categorization[C].In:ACM CIKM Conference,2000
  • 10S Dumais,G Furnas,T Landauer et al.Using Latent Semantic Analysis to Improve Access to Textual Information[C].In:Proceedings of the Conference on Human Factors in Computing Systems CHI'88,Washington,DC,USA,1988

共引文献150

同被引文献60

引证文献6

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部