摘要
【目的】传统AdaBoost回归模型的稳健性不足,改进的AdaBoost.RT+、AdaBoost.RS算法仍然存在对异常数据抑制效果不显著和识别正确率较低等问题,增强AdaBoost方法的稳健性具有重要的实际应用价值。【方法】给出的AdaBoost.R_LOF模型,首先提出二重LOF和逆交叉验证算法,并将两种方法结合,以概率刻画数据的异常程度。然后在AdaBoost.R2算法的基础上,根据数据的异常程度,对数据设置恰当的权重系数,在不影响正常数据迭代的同时抑制异常数据的影响。【结果】使得新模型具有更好的稳健性,并且得到更小的预测均方误差。【局限】该方法需要调节的超参数有所增加,需要根据数据集分布特征进行调整。【结论】模拟和真实案例结果显示,相比于AdaBoost.R2、AdaBoost.RT+和AdaBoost.RS算法,在不同比例异常值的数据集下,该方法都具有更好的稳健性和估计效果。
[Objective]The robustness of the traditional AdaBoost regression model is insufficient.The improved AdaBoost.RT+and AdaBoost.RS algorithms hold insignificant suppression on abnormal data and low identification accuracy of abnormal data.It is meaningful to enhance the robustness of AdaBoost algorithms.[Methods]First,dual LOF and inverse cross validation algorithms are proposed,the abnormal degree of data is characterized by probability based on these two algorithms.Then,appropriate weight coefficients are given according to the abnormal degree of the data to suppress its influence and keep no effect on the normal data.[Results]This AdaBoost.R_LOF model holds better robustness and less mean squared error on prediction.[Limitations]However,more hyperparameters are needed.[Conclusions]Simulations and real applications show that the new model has better robustness and estimation under the different proportions of outliers compared with AdaBoost.R2,AdaBoost.RT+and AdaBoost.RS algorithms.
作者
曾凡倍
杨联强
ZENG Fanbei;YANG Lianqiang(School of Big Data and Statistics,Anhui University,Hefei,Anhui 230601,China;School of Artificial Intelligence,Anhui University,Hefei,Anhui 230601,China)
基金
安徽高校自然科学基金(KJ2021A0049)
安徽省自然科学基金(2208085MA06)。