摘要
特征具有高维、稀疏性。为提高了文本自动分类准确率,针对PCA提取特征需要对大规模文本进行批处理,影响文本的准确率等,提出一种基于增量主元分析方法(CCIPCA)和最二小乘向量机(LSSVM)相结合的文本自动分类算法(CCIPCA-LSSVM)。首先通过互信法选择文本特征,然后采用CCIPCA高维文本特征进行提取,降低特征维数,消除冗余特征,最后采用LSSVM对提取特征进行学习,并通过粒子群算法对分类器优化,建立最优文本自动分类模型。仿真结果表明,相对于其它文本分类算法,CCIPCA-LSSVM提高了文本分类准确率和召回率,解决了文本特征提取过程存在的难题。
Text feature has high dimension and sparse, in order to improve the automatic text classification ac- curacy rate, a text categorization method is proposed based on the incremental principal component analysis ( CCIP- CA ) and least squares support vector machine (LSSVM) named(CCIPCA-LSSVM) for the PCA feature extraction need batch processing for large-scale text and the accuracy rate of text categorization is low. Firstly, the mutual in- formation method is used to select text features, and then the CCIPCA is used to extract the feature from high-di- mensional text features to reduce the feature dimension and eliminate the redundant features ,. finally LSSVM is used to train the extracted features, and the particle swarm algorithm is used to optimize the text classifier to build the automatic text classification model. The simulation results show that compared to other text classification algorithm, CCIPCA-LSSVM improves the classification accuracy rate and recall rate, so it has solved the text feature extraction problem.
出处
《科学技术与工程》
北大核心
2013年第10期2704-2709,共6页
Science Technology and Engineering
关键词
文本分类
特征提取
最小二乘支持向量机
增量主元分析方法
粒子群优化算法
text categorization features abstract least squares support vector machines candid in- cremental principal component analysis particle swarm optimization algorithm