摘要
目前支持向量机(SVM)对均衡文本数据集进行文本分类时表现十分良好,但如果文本数据集是不均衡的,尤其是当不均衡率很大时,容易导致支持向量机分类失败。提出PSO-SMOTE混合算法,针对不均衡文本数据集问题,运用SMOTE算法生成插值样本均衡数据集,并通过PSO算法迭代进化得到最佳的插值样本,对支持向量机的文本分类能力进行优化。实验结果表明,新算法大幅优化了支持向量机分类不均衡文本数据集的能力。
The support vector machine(SVM)performs well in text categorization of balanced text datasets,but will cause the classification failure when the text dataset is unbalanced,especially for high unbalanced ratio.The PSO-SMOTE hybrid algorithm is proposed to solve the unbalanced text datasets.The SMOTE(synthetic minority oversampling technique)algorithm is used to generate the balanced dataset of interpolation sample,and then the iterative evolution is performed for the interpolation sample by means of PSO algorithm to obtain the optimal interpolation sample,and optimize the text classification performance of SVM.The experimental results show that the new algorithm can greatly optimize the ability of SVM to classify the unbalanced text datasets.
作者
高超
许翰林
GAO Chao;XU Hanlin(School of Electronic&Information Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China)
出处
《现代电子技术》
北大核心
2018年第15期183-186,共4页
Modern Electronics Technique
关键词
混合算法
支持向量机
不均衡数据集
插值样本
文本分类
迭代进化
hybrid algorithm
support vector machine
unbalanced dataset
interpolation sample
text classification
iterative evolution