期刊文献+

基于CCIPCA-LSSVM的文本自动分类算法

Text Categorization Algorithm Based on CCIPCA-LSSVM
下载PDF
导出
摘要 特征具有高维、稀疏性。为提高了文本自动分类准确率,针对PCA提取特征需要对大规模文本进行批处理,影响文本的准确率等,提出一种基于增量主元分析方法(CCIPCA)和最二小乘向量机(LSSVM)相结合的文本自动分类算法(CCIPCA-LSSVM)。首先通过互信法选择文本特征,然后采用CCIPCA高维文本特征进行提取,降低特征维数,消除冗余特征,最后采用LSSVM对提取特征进行学习,并通过粒子群算法对分类器优化,建立最优文本自动分类模型。仿真结果表明,相对于其它文本分类算法,CCIPCA-LSSVM提高了文本分类准确率和召回率,解决了文本特征提取过程存在的难题。 Text feature has high dimension and sparse, in order to improve the automatic text classification ac- curacy rate, a text categorization method is proposed based on the incremental principal component analysis ( CCIP- CA ) and least squares support vector machine (LSSVM) named(CCIPCA-LSSVM) for the PCA feature extraction need batch processing for large-scale text and the accuracy rate of text categorization is low. Firstly, the mutual in- formation method is used to select text features, and then the CCIPCA is used to extract the feature from high-di- mensional text features to reduce the feature dimension and eliminate the redundant features ,. finally LSSVM is used to train the extracted features, and the particle swarm algorithm is used to optimize the text classifier to build the automatic text classification model. The simulation results show that compared to other text classification algorithm, CCIPCA-LSSVM improves the classification accuracy rate and recall rate, so it has solved the text feature extraction problem.
作者 张鸿彦
出处 《科学技术与工程》 北大核心 2013年第10期2704-2709,共6页 Science Technology and Engineering
关键词 文本分类 特征提取 最小二乘支持向量机 增量主元分析方法 粒子群优化算法 text categorization features abstract least squares support vector machines candid in- cremental principal component analysis particle swarm optimization algorithm
  • 相关文献

参考文献9

二级参考文献89

共引文献371

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部