摘要
针对个人信用数据大多数据类型杂糅以及传统K-means聚类初始簇中心和个数难以确定的问题,提出一种改进的K-means聚类与粗糙集相结合的个人信用集成分类模型。首先,基于样本空间密度衡量样本点的聚集程度,以确定初始簇中心,并引入改进的自适应思想动态调整簇中心个数进行K-means聚类,从而实现对连续型数据的离散化;其次,运用粗糙集进行属性约简,获得特征子集;最后,结合代价敏感构建以L1-逻辑回归、弹性网-逻辑回归、贝叶斯、决策树和神经网络为基模型的集成模型,实现对个人信用数据的有效分类。实验结果表明,本文提出的集成分类模型在UCI数据集上,较已有模型的G-means平均提高约2.96%,最大提高约5.35%,F-value平均提高约3.42%,最大提高约6.83%。
An improved personal credit integration classification model combining K-means clustering and rough set was proposed to solve the problem that most personal credit data have mixed data types and it is difficult to determine the initial cluster center and number of traditional K-means clustering. Firstly, the clustering degree of sample points was measured based on the density of sample space to determine the initial cluster centers, and the improved adaptive idea was introduced to dynamically adjust the number of cluster centers for K-means clustering, so as to realize the discretization of continuous data. Secondly, rough set is used for attribute reduction to get the feature subset;Finally, an integrated model based on L1-logistic regression, elastic net-logistic regression, Bayes, decision tree and neural network is constructed combining cost sensitivity to achieve effective classification of unbalanced personal credit data. Experimental results show that compared with the existing models, the proposed integrated classification model can improve G-means by 2.96% and maximum by 5.35% on average, and F-value by 3.42% and maximum by 6.83% on UCI data set.
作者
张怡
谢晓金
ZHANG Yi;XIE Xiao-jin(School of Mathematics and Statistics,Shanghai University of Engineering Science,Shanghai 201620,China)
出处
《软件导刊》
2023年第2期142-147,共6页
Software Guide
基金
浦东新区科技发展基金产学研专项资金(人工智能)项目(PKX2020-R02)。