摘要
针对不平衡数据集上的分类问题,提出了基于Lévy分布的过采样方法,其核心思想是根据初始数据集的分布,利用Lévy分布构造新样本的密度分布。基于Lévy分布的特性,使得从边界样本合成的新样本密度最大,靠近多数类的样本合成的新样本密度次之,靠近少数类的样本合成的新样本密度最小。因此,该算法可以增强分类边界,同时可以减小噪声生成。通过在多个数据集上的实验,表明所提算法可以有效改善不平衡数据的分类效果。
For the classification problems on imbalanced datasets,a Lévy-based oversampling technique is proposed.Its essential idea is to employ Lévy distribution to construct the density distribution of synthetic samples according to the distribution of original datasets.Due to the properties of the Lévy distribution,the density of new samples synthetized from the borderlines is the largest,the density of new samples synthetized from the samples closer to the majority is the second one,and the density of new samples synthetized from the samples closer to the minority is the smallest.Thus,this approach can enhance the decision boundary and reduce the noise generation in the same time.Experiments on multiple datasets show that the proposed approach can effectively improve the classification results on imbalanced datasets.
作者
张扬帆
张海鹏
孙俊
ZHANG Yangfan;ZHANG Haipeng;SUN Jun(School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China)
出处
《计算机工程与应用》
CSCD
北大核心
2019年第16期150-156,共7页
Computer Engineering and Applications
基金
国家自然科学基金(No.61672263)