摘要
代价敏感学习是为了尽可能使得少数类样本不被误分,采用针对各个类别的样本设置不一样的误分代价的方法,是解决类别不平衡问题的重要方法之一。但是,其缺陷是未考虑样本在特征空间中的具体分布情况。针对这一问题,论文基于加权极限学习机,融合模糊加权的理念,提出一种鲁棒性更强的新概念——相对密度信息,该方法是通过K近邻概率密度估计策略计算各训练样本间的相对密度,可以避免在高维空间下直接进行概率密度的计算,然后进行隶属函数的设计,模糊化和个性化设置每个样本的权重,通过以上方法生成的权重矩阵来代替加权极限学习机中的权重矩阵,从而设计出基于类内相对密度信息的模糊代价敏感极限学习机和基于类间相对密度信息的模糊代价敏感极限学习机。最后通过从Keel仓库随机获取的20个二元不平衡数据集,对所提两种算法是否有效及可行进行验证。根据实验结果,与流行的类别不平衡学习算法相比,所提算法在G-mean等评价指标上具有较优表现,因此所提算法构造的预测模型具有更好的预测性能。
Cost sensitive learning is to avoid misclassification of a few class samples as much as possible.It is one of the important methods to solve the problem of class imbalance by setting different misclassification costs for each class of samples.It is one of the important methods to solve the problem of class imbalance.However,the defect is that the specific distribution of samples in the feature space is not considered.To solve this problem,based on the weighted extreme learning machine and the concept of fuzzy weighting,this paper proposes a new concept with stronger robustness called relative density information.This method calculates the relative density between training samples through the K-nearest neighbor probability density estimation strategy,which can avoid the calculation of probability density directly in high-dimensional space.Then this paper designs the membership function which fuzzifies and personalizes the weight of each sample,and replace the weight matrix in the weighted extreme learning machine with the weight matrix generated by the above method,so as to design the fuzzy cost sensitive extreme learning machine based on the intra class relative density information and the fuzzy cost sensitive extreme learning machine based on the inter class relative density information.Finally,the effectiveness and feasibility of the two algorithms are verified by 20 binary unbalanced data sets randomly obtained from Keel warehouse.According to the experimental results,compared with the popular class imbalance learning algorithms,the proposed algorithms have better performance in G-mean and other evaluation indexes,so the prediction model constructed by the proposed algorithms has better prediction performance.
作者
刘毅鹏
高尚
LIU Yipeng;GAO Shang(Jiangsu University of Science and Technology,Zhenjiang 212000)
出处
《计算机与数字工程》
2023年第8期1800-1805,共6页
Computer & Digital Engineering
关键词
类别不平衡学习
相对密度信息
模糊集
代价敏感
极限学习机
class imbalance learning
relative density information
fuzzy set
cost sensitive
extreme learning machine