摘要
针对支持向量机(support vector machine,SVM)无法对非平衡数据有效分类的问题,提出树形层次结构的非平衡SVM(imbalanced SVM method based on tree hierarchical structure,ISVM_TH)分类方法。通过衡量多数类样本与超平面之间的关系,有效区分不同类的重要性,提取关键簇,通过对关键簇进行逐层划分,构建更为合理的多数类样本树形层次结构,提取候选支持向量(candidate support vector,CSV)信息,参与SVM的训练过程,提高SVM对于非平衡数据的分类能力。实验结果表明,该方法能够有效改善SVM对于非平衡数据的分类性能,获得令人满意的泛化能力。
To solve the problem that the traditional SVM cannot solve the imbalanced datasets classification problems efficiently, an imbalanced support vector machine (SVM) based on tree hierarchical structure (ISVM_ TH) classification algorithm was proposed. The key clusters were extracted and the importance of various clusters was distinguished by measuring the relationship between the majority class samples and the hyperplane. And the more reasonable tree hierarchical structure for majority class samples was constructed through the layer by layer division for the importance clusters. The candidate support vectors (CSV) were extracted and trained using SVM to improve the classification ability of SVM for imbalanced datasets. Experimental results demonstrate that the proposed method can improve the classification results for imbalanced datasets classification problems and obtain good gene-ralization performance.
作者
邓曦辉
赵丽
DENG Xi-hui ZHAO Li(School of Information Technology and Engineering, Jinzhong College, Jinzhong 030619, China)
出处
《计算机工程与设计》
北大核心
2017年第8期2269-2275,共7页
Computer Engineering and Design
基金
山西省高校科技创新基金项目(2015110)
关键词
支持向量机
非平衡数据
树形层次结构
关键簇
候选支持向量
support vector machine
imbalanced dataset
tree hierarchical structure
key cluster
candidate support vectors