摘要
利用遗传算法优化投影方向,投影寻踪模型将高维的文本特征数据投影到2~3维的低维可视化空间上,并根据高维数据在这个低维空间当中的投影特征值来反映其线性和非线性结构或特征,达到数据降维目的并实现文本数据特征可视化。不仅大大约简了文本挖掘过程的计算复杂性,还有助于在K-means聚类算法中确定初始中心点数目,提高算法精度。实验验证了这种方法应用于文本特征降维的有效性。
Using genetic algorithm to search for the optimal projecting direction, projection pursuit model was used to project text feature data from high-dimensional space into low-dimensional space (2 or 3 dimensions ), and the linear and nonlinear structures and features of the high-dimensional data were shown by its projecting feature value in the low dimensional space, therefore dimensionality was reduced and visualization for high-dimensional text feature data was realized. This method is not only cutting down the computing complexity in the process of text mining, but also helping to determine the number of initial center point for K-means algorithm, and improving the accuracy of the algorithm. Experiments demonstrate the efficiency of this method for text feature dimension reduction.
出处
《计算机应用》
CSCD
北大核心
2008年第6期1411-1413,1416,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60275020)
上海市教委科研项目(06FZ007)
上海海事大学重点学科建设项目(XL0101)
关键词
投影寻踪
降维
文本挖掘
遗传算法
projection pursuit
dimension reduction
text mining
genetic algorithm