期刊文献+

基于投影寻踪降维的文本特征可视化 被引量:3

Projection-pursuit-based dimension reduction for visualization of text features
下载PDF
导出
摘要 利用遗传算法优化投影方向,投影寻踪模型将高维的文本特征数据投影到2~3维的低维可视化空间上,并根据高维数据在这个低维空间当中的投影特征值来反映其线性和非线性结构或特征,达到数据降维目的并实现文本数据特征可视化。不仅大大约简了文本挖掘过程的计算复杂性,还有助于在K-means聚类算法中确定初始中心点数目,提高算法精度。实验验证了这种方法应用于文本特征降维的有效性。 Using genetic algorithm to search for the optimal projecting direction, projection pursuit model was used to project text feature data from high-dimensional space into low-dimensional space (2 or 3 dimensions ), and the linear and nonlinear structures and features of the high-dimensional data were shown by its projecting feature value in the low dimensional space, therefore dimensionality was reduced and visualization for high-dimensional text feature data was realized. This method is not only cutting down the computing complexity in the process of text mining, but also helping to determine the number of initial center point for K-means algorithm, and improving the accuracy of the algorithm. Experiments demonstrate the efficiency of this method for text feature dimension reduction.
作者 高茂庭 陆鹏
出处 《计算机应用》 CSCD 北大核心 2008年第6期1411-1413,1416,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60275020) 上海市教委科研项目(06FZ007) 上海海事大学重点学科建设项目(XL0101)
关键词 投影寻踪 降维 文本挖掘 遗传算法 projection pursuit dimension reduction text mining genetic algorithm
  • 相关文献

参考文献5

  • 1FODOR I K. A survey of dimension reduction techniques, LLNL TR UCRL-ID-148494 [ R]. 2002.
  • 2FRIEDMAN J H, TUKEY J W. A projection pursuit algorithm for exploratory, data analysis [J]. IEEE Transactions on Computer,1974, 23(9): 881 - 890.
  • 3ZHU DONG-HUA, PORTER A L. Automated extraction and visualization of information for technological intelligence and forecasting [J]. Technological Forecasting and Social Change, 2002, 69 (5) : 495 - 506.
  • 4王顺久,张欣莉,丁晶,侯玉.投影寻踪聚类模型及其应用[J].长江科学院院报,2002,19(6):53-55. 被引量:83
  • 5GAO MAO-TING, WANG ZHENG-OU. A new algorithm for text clustering based on projection pursuit [ C]// The 6th International Conference on Machine Learning and Cybernetics. Washington: IEEE Press, 2007:3401 -3405.

二级参考文献5

共引文献82

同被引文献30

  • 1袁远,季星来,孙之荣,李衍达.Isomap在基因表达谱数据聚类分析中的应用[J].清华大学学报(自然科学版),2004,44(9):1286-1289. 被引量:11
  • 2刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 3吴玲达,贺玲,蔡益朝.高维索引机制中的降维方法综述[J].计算机应用研究,2006,23(12):4-7. 被引量:8
  • 4HanJ KamberM.数据挖掘概念与技术[M].北京:机械工业出版社,2002..
  • 5Tenenbattm J B, Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction [ J ]. Science, 2000, 290(5500):2319-2323.
  • 6Salton G, Wong A, Yang C S. A vetor space model for automatic indexing[ J ]. Communication of the ACM, 1975, 18 (11) 613- 620.
  • 7彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量:44
  • 8T Jolliffe. Principal component analysis[M]. Springer Verlag. New York, 2002.
  • 9K Fukunaga. Introduction to Statistical Paltern Recognition[M]. California: Academic Press,1990.
  • 10A H, E Oja. Independent component analysis: algorithm and application[J]. Neural Networks, 2000, 13 (45): 411-430.

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部