摘要
针对非平衡数据集中类分布信息不对称现象,提出一种新的过采样算法DB_SMOTE(Distance-based Synthetic Minority Over-sampling Technique),通过合成少数类新样本解决样本不足问题。算法基于样本与类中心距离,结合类聚集程度提取种子样本。根据SMOTE(Synthetic Minority Over-sampling Technique)算法思想,在种子样本上实现少数类新样本合成。根据种子样本与少数类中心距离构造新样本分布函数。基于此采样算法并在多个数据集上进行分类实验,结果表明DB SMOTE算法是可行的。
In order to solve the asymmetry of class distribution information in imbalanced data, DB_SMOTE(Distance-based Synthetic Minority Over-sampling Technique)algorithm is presented by minority new sample synthetic. According to the distance between sample and the centre of class, seed sample is gained by combining class aggregation. Based on SMOTE (Synthetic Minority Over-sampling Technique), new sample is synthesized. Based upon the distance between seed sample and the centre of minority class, new sample distribution function is formed. Classification experiment results show DB_SMOTE is feasible.
出处
《计算机工程与应用》
CSCD
2014年第6期92-95,共4页
Computer Engineering and Applications
基金
国家自然科学基金(No.61300170
No.71371012)
教育部人文社科基金(No.13YJA630098)
安徽省自然科学基金重点资助项目(No.KJ2013A040)
高校省级优秀青年人才基金重点项目(No.2013SQRL034ZD)
校青年基金(No.2013YQ31
No.2012YQ32)
关键词
非平衡数据学习
过采样
数据分类
imbalanced data learning
oversampling
data classification