摘要
本文对数据不平衡以及其他因素对支持向量分类机的影响进行了简单而系统的实验研究。结果表明,数据不平衡的实质是边界信息的不平衡,由此可能导致分类边界不恰当的偏移,进而降低分类器的性能。增大学习样本容量可丰富边界信息,进而可削弱数据不平衡对分类器带来的不良影响。然而,当分类学习的概念较复杂时,即使数据是平衡的,分类器也很难获得理想的决策边界。
A simple but systematic experimental study has been performed about the effect of data imbalance and other factors on support vector classification machines. The experimental results show that the nature of data imbalance is the imbalance of the boundary information, whose existence may lead to the undesirable bias of the classifiers and further degrade their performances. Enlarging the size of the training sample sets may enrich the boundary information and lessen the bad effect on the classifiers brought by data imbalance. However, when the complexity of the concepts to be learned is high, it boundary, even if the data is balanced. is very difficult for the classifiers to obtain the ideal decision
出处
《广东技术师范学院学报》
2008年第6期15-19,共5页
Journal of Guangdong Polytechnic Normal University
关键词
模式分类
数据不平衡
支持向量机
pattern classification
data imbalance
support vector machine