摘要
不平衡类问题在现实生活中普遍存在,表现为一个类的实例数明显多于另一个类的实例数,其类分布不平衡这一特征导致了传统的分类方法不能很好地处理该类问题.本文将k-means和逻辑回归模型相结合,提出一种叫做ILKL(Imbalanced Learning based on K-means and Logistic Regression)的算法处理不平衡类问题.首先,ILKL使用聚簇方法将多数类划分成一个个子簇,以重新平衡数据集,然后在相对的平衡的数据集上学习逻辑回归模型.UCI数据集上的实验结果显示,与传统方法相比,本文方法在召回率、g-mean和f-measure等指标上表现出更好的性能.
Class-imbalance is very common in real world,which is usually characterized as having more instances of one class than another.Because of imbalanced class distribution,so the traditional classification method doesn′t work well on imbalanced class.This paper combines k-means with logistic regression model,and proposes a novel method named ILKL(Imbalanced Learning based on k-means and Logistic Regression) for the imbalanced problem.Firstly,ILKL applies clustering method to divide majority class into small clusters to rebalance the dataset for the learning of logistic regression model.The experiments on UCI data sets shows that the proposed method has a significant superiority on measures of recall,g-mean and f-measure when compared with other state-of-the art methods.
出处
《小型微型计算机系统》
CSCD
北大核心
2017年第9期2119-2124,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61501393)资助
河南省科技厅科技计划项目(162102210310)资助
河南省教育厅科技研究重点项目(15A520026)资助
信阳师范学院研究生科研创新基金重点项目(2015KYJJ39)资助