摘要
互联网金融中的网络贷款用户数据具有类别不平衡的特性,严重影响传统分类器的性能。随机平衡采样算法在对原始数据集进行重采样的过程中,将所有样本同等考虑,本文在平衡采样的过程中充分考虑样本点的性能,将其分为3类样本:安全的、边界的、噪声的,针对不同类型的样本采用相应的采样方法,得到平衡的新数据集,然后对该数据集进行Bagging集成,提高算法的泛化性能,结果表明本文改进的随机平衡采样(Improved Random Balanced Sampling,IRBS) Bagging算法可以较好地对网络贷款用户进行分类。
The data of network loan users in Internet finance has the characteristics of class imbalance, which seriously affects the performance of traditional classifiers. The random balanced sampling algorithm considers all samples equally in the process of resampling the original data set. In this paper, the performance of the sample points is fully considered in the process of balanced sampling, and it is divided into three types of samples: safe, boundary, and noisy. The corresponding sampling method is adopted for different types of samples to obtain a balanced new data set, and then the Bagging integration of the data set is performed to improve the generalization performance of the algorithm. The results show that the Improved Random Balanced Sampling(IRBS) Bagging algorithm in this paper can better classify loan users.
作者
郭冰楠
吴广潮
GUO Bing-nan;WU Guang-chao(School of Mathematics, South China University of Technology, Guangzhou 510641, China)
出处
《计算机与现代化》
2019年第4期11-16,共6页
Computer and Modernization