摘要
文本分类中特征向量空间是高维和稀疏的,降维处理是分类的关键步骤。针对传统特征提取方法的不足,提出采用基于迭代的CCIPCA和ICA特征提取方法处理大规模文本分类问题,实验结果表明降维提高了分类效果。在CCIPCA、ICA及ICA与IG组合降维的方法中,基于ICA降维的分类效果是最好的。
Feature space is high dimensional and sparse in text categorization, the process of dimension reduction is a very key problem for large-scale text categorization. The classical methods of feature extraction are inadequate to deal with these problems. In this paper the contrast experiment carries on large-scale text categorization by using CCIPCA and ICA, the result shows that ICA achieves the best performance among CCIPCA,ICA and ICA-IG in the same data set.
出处
《电脑知识与技术》
2009年第7X期5768-5769,5775,共3页
Computer Knowledge and Technology