摘要
针对非平衡数据分类问题,提出了一种基于差异采样率的重采样算法(differentiated sampling rate algorithm,DSRA),基于DSRA设计了一种新的集成分类器(SVM-Ripper ensemble classifier,SREC)。SREC采用独特的分类器选择策略、分类器集成策略、分类决策方案,可获得较高的分类精度。同时,利用SREC对影响非平衡数据分类的关键问题进行了研究。结果表明,非平衡数据分类问题本质上是由正负样本类间非平衡、类内非平衡、样本规模以及样本非平衡度等诸多因素引起的,只有综合考虑这些因素才能更好地解决非平衡数据分类问题。
For the issue of classification in imbalanced datasets,this paper presents a new differentiated sampling rate algorithm(DSRA),on this basis,a SVM-Ripper ensemble classifier(SREC) is proposed.SREC employs an unique classifier selection strategy,a novel classifier integration approach and an original classification decision-making method,so that it receives a higher classification accuracy.At the same time,the source of classification in an imbalanced dataset is studied by use of SREC.The simulation results prove that the source of classification in an imbalanced dataset is the aggregation of imbalance between classes,imbalance within a class,sample size as well as the imbalance degree,and only a comprehensive consideration of these factors can better address the issue of classification in imbalanced dataset.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2011年第1期196-201,共6页
Systems Engineering and Electronics
基金
国家自然科学基金(60675030
60875029)资助课题
关键词
数据挖掘
非平衡类数据分类
集成分类器
关键问题
data mining
classification in imbalanced datasets
ensemble classifier
source