摘要
为了解决非平衡数据分类问题,提出了一种基于最优间隔的AdaBoost v算法。该算法采用改进的SVM作为基分类器,在SVM的优化模型中引入间隔均值项,并根据数据非平衡比对间隔均值项和损失函数项进行加权;采用带有方差减小的随机梯度方法(Stochastic Variance Reduced Gradient,SVRG)对优化模型进行求解,以加快收敛速度。所提基于最优间隔的AdaBoost v算法在样本权重更新公式中引入了一种新的自适应代价敏感函数,赋予少数类样本、误分类的少数类样本以及靠近决策边界的少数类样本更高的代价值;另外,通过结合新的权重公式以及引入给定精度参数v下的最优间隔的估计值,推导出新的基分类器权重策略,进一步提高了算法的分类精度。对比实验表明,在线性和非线性情况下,所提基于最优间隔的AdaBoost v算法在非平衡数据集上的分类精度优于其他算法,且能获得更大的最小间隔。
In order to solve the problem of imbalanced data classification,this paper proposes an AdaBoost v algorithm based on optimal margin.In this algorithm,the improved SVM is used as the base classifier,the margin mean term is introduced into the optimization model of SVM,and the margin mean term and loss function term are weighted by data imbalance ratio.The stochastic variance reduced gradient(SVRG)is used to solve the optimization model to improve the convergence rate.In the optimal margin AdaBoost v algorithm,a new adaptive cost sensitive function is introduced into the instance weight update formula,the minority instances,the misclassified instances and the borderline minority instances are assigned higher cost values.In addition,a new weight strategy of the base classifier is derived by combining the new weight formula and introducing the estimated value of the optimal margin under the given precision parameter v,so as to further improve the classification accuracy of the algorithm.The experimental results show that the classification accuracy of the AdaBoost v algorithm with optimal margin is better than other algorithms on imbalanced datasets in the case of linear and nonlinear,and it can obtain a larger minimum margin.
作者
鲁淑霞
张振莲
LU Shu-xia;ZHANG Zhen-lian(College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China;Hebei Province Key Laboratory of Machine Learning and Computational Intelligence,Baoding,Hebei 071002,China)
出处
《计算机科学》
CSCD
北大核心
2021年第11期184-191,共8页
Computer Science
基金
国家自然科学基金项目(61672205)
河北省科技计划重点研发项目(19210310D)。