摘要
随机欠采样方法忽略潜在有用的大类样本信息,在面对多类分类问题时更为突出.文中提出多类类别不平衡学习算法:EasyEnsemble.M.该算法通过多次针对大类样本随机采样,充分利用被随机欠采样方法忽略的潜在有用的大类样本,学习多个子分类器,利用混合的集成技术最终得到性能较优的强分类器.实验结果表明,与常用的多类类别不平衡学习算法相比,EasyEnsemble.M可有效提高分类器的G-mean值.
The potential useful information in the majority class is ignored by stochastic under-sampling. When under-sampling is applied to multi-class imbalance problem, this situation becomes even worse. In this paper, EasyEnsemble. M for multi-class imbalance problem is proposed. The potential useful information contained in the majority classes which is ignored is explored by stochastic sampling the majority classes for multiple times. Then, sub-classifiers are learned and a strong classifier is obtained by using hybrid ensemble techniques. Experimental results show that EasyEnsemble. M is superior to other frequently used multi-class imbalance learning methods when G-mean is used as performance measure.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2014年第2期187-192,共6页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金青年基金项目(No.61105046)
教育部高等学校博士学科点专项科研基金项目(No.20110092120029)
南京大学软件新技术国家重点实验室开放课题项目(No.KFKT2011B01)资助
关键词
机器学习
类别不平衡学习
欠采样
集成
Machine Learning
Class-Imbalance Learning
Under-Sampling
Ensemble