摘要
在类别不均衡的数据中,类间和类内不均衡性问题都是导致分类性能下降的重要因素.为了提高不均衡数据集下分类算法的性能,提出一种基于概率分布估计的混合采样算法.该算法依据数据概率分别对每个子类进行采样以保证类内的均衡性;并扩大少数类的潜在决策域和减少多数类的冗余信息,从而同时从全局和局部两个角度改善数据的平衡性.实验结果表明,该算法提高了传统分类算法在不均衡数据下的分类性能.
In the class imbalanced data distribution, both the between-class and within-class imbalance issues are critical factors to decrease the performance. To improve the performance of classifier algorithm on the imbalanced data, a hybrid sampling algorithm based on probability distribution estimation is proposed. The approach re-samples the data of subclass to balance the distribution in each class based on probability distribution estimation. Moreover, it expands the decision region of minority class and removes the redundant information of majority class, so as to solve the imbalance issues from both global and local perspectives simultaneously. Experimental results show that the proposed method improves the classification performance for imbalanced data.
出处
《控制与决策》
EI
CSCD
北大核心
2014年第5期815-820,共6页
Control and Decision
基金
国家自然科学基金项目(61001047)
中央高校基本科研业务费专项资金项目(N110618001)
关键词
不均衡数据学习
类内不均衡
混合采样
概率分布估计
imbalanced data learning
within-class imbalance
hybrid sampling
probability distribution estimation