摘要
在不平衡数据分类问题中,作为目标对象的少数类往往不易识别.常见方法存在需要显式设置实例重要度、仅仅间接支持少数类的识别等缺点.由此,文中提出基于实例重要性的支持向量机——IISVM.它分为3个阶段.前两个阶段分别采用单类支持向量机和二元支持向量机,将数据按照"最重要"、"较重要",和"不重要"3个档次重新组织.阶段3首先选择最重要的数据训练初始分类器,并通过显式设置早停止条件,直接支持少数类的识别.实验表明,IISVM的平均分类性能优于目前的主流方法.
In the problem of imbalanced data classification, the minority class is the classification target, but it is more difficult to be recognized than the majority class. The current popular classification algorithms have two main disadvantages: the explicit setup of instances importance degrees and the indirect support of the recognition of minority class. An instance importance based learning algorithm is proposed, namely instance importance based support vector machine (IISVM). IISVM is composed of three phases. In the first two phases, one class SVM and binary SVM are used respectively. And the training instances are divided into three groups: the most important group, important group and unimportant group. In the last phase, the most important instances are employed to train the initial classifier, and then the explicit stopping criteria are adopted to control the recognition of minority class directly. The experimental results illustrate that the performance of IISVM is superior to other standard or advanced solutions.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2009年第6期913-918,共6页
Pattern Recognition and Artificial Intelligence
关键词
不平衡数据
实例重要性
支持向量机
重采样
代价敏感学习
Imbalanced Data, Instance Importance, Support Vector Machine, Resampling, Cost Sensitive Learning