Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary cl...Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks.However,few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously.To this end,this study proposes a Tomek link and genetic algorithm(GA)-based under-sampling framework(TEUS)to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors.TEUS first determines boundary majority instances with Tomek link,then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance.Second,TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels.After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution,TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set.Innovatively,the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search.Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS.展开更多
基金supported by National Key R&D Programof ChinaunderGrant No.2019YFB1404600Beijing Natural Science Funds under Grant No.9162003Beijing's"High-grade,Precision and Advanced Discipline Construction(Municipal)-Business Administration"project under Grant No.19008022065.
文摘Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks.However,few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously.To this end,this study proposes a Tomek link and genetic algorithm(GA)-based under-sampling framework(TEUS)to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors.TEUS first determines boundary majority instances with Tomek link,then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance.Second,TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels.After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution,TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set.Innovatively,the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search.Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS.