摘要
针对不平衡数据学习问题,提出一种基于欠采样的分类算法。对多数类样例进行欠采样,保留位于分类边界附近的多数类样例。以AUC为优化目标,选择最恰当的邻域半径使数据达到平衡,利用欠采样后的样例训练贝叶斯分类器,并采用AUC评价分类器性能。仿真数据及UCI数据集上的实验结果表明,该算法有效。
Imbalanced Data Learning(IDL) problem is one of the research issues in machine learning.This paper presents a classification algorithm based on undersampling,which algorithm undersamples the majority examples,and retains the majority examples near the classify border.With the AUC as the optimization objectives.It chooses the most appropriate domain radius to balance the data set,and trains the Bayesian classifier by the use of the examples after undersampling.Using AUC as a measure of classifier performance evaluation,the experiments on simulation data and UCI data sets show that undersampling is effective
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第13期147-149,共3页
Computer Engineering
基金
国家科技支撑计划基金资助项目(2006BAK01A33)
公安部重点科研基金资助项目(B类)(20032252001)
吉林省科技发展计划基金资助项目(20070321
20090704)
关键词
机器学习
分类算法
不平衡数据
欠采样
邻域
machine learning
classification algorithm
imbalanced data
undersampling
neighborhood