期刊文献+

一种基于样本学习复杂度的不平衡数据过采样方法 被引量:2

An Oversampling Method for Imbalanced Data Based on Learning Complexity of Samples
下载PDF
导出
摘要 在人们的生活中存在大量的不平衡数据,如何识别人们感兴趣的少数类是一个具有挑战性的问题。论文基于ADASYN算法中提出的样本学习复杂度的思想,设计了一种新的过采样方法LDSMOTE。在该方法中,少数类主样本的学习复杂度与该主样本在少数类和多数类样本空间的分布都有关,ADASYN只利用了邻域多数类样本分布信息,而LDSMOTE融合了局部少数类平均距离和局部多数类样本数的信息。不同于ADASYN中复杂度是离散值,论文中的复杂度是连续的值,更能表现不同主样本之间的差异性和复杂度的多样性。分类器使用支持向量机,对KEEL不平衡数据库中的19个数据集进行实验,结果表明,在超过半数的数据集上,LDSMOTE的Recall、G-mean和AUC性能优于SMOTE、Borderline-SMOTE以及ADASYN算法。 There is a large amount of imbalanced data in people's lives,and how to identify the minority class which people are interested in is a challenging problem.Based on the idea of sample learning complexity proposed in the ADASYN algorithm,a new oversampling method LDSMOTE is designed.In this method,the learning complexity of a minority class main samples is relat⁃ed to the distribution of the main sample in the minority class and the majority class sample space.ADASYN only uses the neighbor⁃hood majority class sample distribution information,while LDSMOTE fuses average distance of the local minority class and informa⁃tion on the number of local majority samples.Unlike the complexity in ADASYN,which is a discrete value,the complexity in this paper is a continuous value,which is more representative of the diversity of differences and complexity between different main sam⁃ples.The classifier uses the support vector machine to experiment with 19 data sets in the KEEL imbalanced database.The results show that LSDMOTE's Recall,G-mean and AUC performance is better than SMOTE,Borderline-SMOTE and ADASYN algorithm on more than half of the data sets.
作者 许皓 孙廷凯 XU Hao;SUN Tingkai(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处 《计算机与数字工程》 2020年第8期1846-1851,1857,共7页 Computer & Digital Engineering
关键词 过采样 不平衡数据 主样本 学习复杂度 样本分布 oversampling imbalanced data main sample learning complexity sample distribution
  • 相关文献

同被引文献17

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部