摘要
针对违约数据存在数据量大、维度多、不平衡及噪声大等缺点,提出一种改进的支持向量机方法,即基于Optuna框架的L_(p)范数约束的代价敏感的多核支持向量机(L_(p)-Optuna-SVM)。该方法采用成本矩阵对不同预测错误赋予不同数值,通过多核学习引入多核混合核函数组合;同时采用Optuna优化框架对犯错成本、核函数的参数和权重实现了自动化的调优过程;还在核函数权重上引入L_(p)范数约束,以提高模型对噪声和异常数据的鲁棒性。最后,对4种常用的基础核函数组合的L_(p)-Optuna-SVM进行探讨,并与单核支持向量机以及K邻近法、逻辑回归、高斯贝叶斯进行对比。结果表明,在给定数据集上,L_(p)-Optuna-SVM在违约数据上的g-mean和AUC均高于其他算法,并且在加了不同方差的噪声数据集上,该算法整体依旧保持较好的鲁棒性。
In allusion to the drawbacks of large data volume,multiple dimensions,imbalance,and high noise in default data,an improved support vector machine method is proposed,which is a cost sensitive multi kernel support vector machine(L_(p) Optuna SVM) based on the L_p-norm-constrained of the Optuna framework.In this method,a cost matrix is used to assign different values to different prediction errors,and the combinations of multi kernel mixed kernel function is introduced by means of multi kernel learning.The Optuna optimization framework is used to automate the tuning process for error costs,kernel function parameters,and weights.The L_p-norm-constrained is introduced on kernel function weights,so as to improve the model′s robustness against noise and outlier data.The L_p-Optuna-SVM of four commonly used combinations of basic kernel functions is explored and compared with single kernel support vector machines,K-nearest neighbor method,logistic regression,and Gaussian Bayes.The results show that,on the given dataset,L_p-Optuna-SVM has higher g-mean and AUC on default data than those of other algorithms,and this algorthm can overall maintain good robustness on noisy datasets with different variances.
作者
郑怡昕
王重仁
ZHENG Yixin;WANG Chongren(Shandong University of Finance and Economics,Jinan 250002,China)
出处
《现代电子技术》
北大核心
2024年第6期147-153,共7页
Modern Electronics Technique
基金
山东省科技型中小企业创新能力提升工程(2023TSGC0208)。