摘要
投影寻踪是通过寻找最能反映原高维数据的结构或特征的投影方向,把高维数据投影到低维子空间上,从而实现在低维空间上研究分析高维数据的目的。针对文本分类中维数灾难问题,采用投影寻踪模型,将高维的文本数据降到超低维。投影寻踪的关键是构造能够找到最佳投影方向的有效算法,本文根据免疫进化的思想提出了免疫进化的投影寻踪模型,该模型能有效地寻找最佳的投影方向。将该方法应用于Reuters-21578文档集和复旦文档集,实验结果表明此方法不仅能有效解决文本分类中难数灾难问题,而且得到了很好的分类性能。
Projection pursuit is used to find the projection direction which reflect the intrinsic structure and features of the original data.So the high dimensional data can be studied and analyzed through lower dimensional space.For the curse of dimensionality in text classification,the multi-dimensional data is turned into low-dimensional space by using projection pursuit model.The nature of the problem is that it is difficult to determine the direction of projection pursuit and the calculation is not easy,especially when the projection direction has more indicators.This paper proposed the projection pursuit model using immune evolution algorithm,which can effectively solve the optimization problem of the projection direction.In the experiment,it's applied to text classification including the Reuters-21578 document sets and FuDan document sets.The experimental results show that this method can not only solve the curse of dimensionality in text classification,but also obtain a satisfactory classification results.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2011年第1期123-128,共6页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(60963014)
江西省自然科学基金资助项目(2008GZS0052)
江西省教育厅青年科学基金资助项目(GJJ11067
GJJ10089)
关键词
免疫进化算法
投影寻踪
投影方向
文本分类
immune evolution algorithm
projection pursuit
projection direction
text classification