期刊文献+

基于Lévy分布的不平衡数据过采样方法 被引量:1

Lévy-Based Oversampling Technique for Imbalanced Datasets
下载PDF
导出
摘要 针对不平衡数据集上的分类问题,提出了基于Lévy分布的过采样方法,其核心思想是根据初始数据集的分布,利用Lévy分布构造新样本的密度分布。基于Lévy分布的特性,使得从边界样本合成的新样本密度最大,靠近多数类的样本合成的新样本密度次之,靠近少数类的样本合成的新样本密度最小。因此,该算法可以增强分类边界,同时可以减小噪声生成。通过在多个数据集上的实验,表明所提算法可以有效改善不平衡数据的分类效果。 For the classification problems on imbalanced datasets,a Lévy-based oversampling technique is proposed.Its essential idea is to employ Lévy distribution to construct the density distribution of synthetic samples according to the distribution of original datasets.Due to the properties of the Lévy distribution,the density of new samples synthetized from the borderlines is the largest,the density of new samples synthetized from the samples closer to the majority is the second one,and the density of new samples synthetized from the samples closer to the minority is the smallest.Thus,this approach can enhance the decision boundary and reduce the noise generation in the same time.Experiments on multiple datasets show that the proposed approach can effectively improve the classification results on imbalanced datasets.
作者 张扬帆 张海鹏 孙俊 ZHANG Yangfan;ZHANG Haipeng;SUN Jun(School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China)
出处 《计算机工程与应用》 CSCD 北大核心 2019年第16期150-156,共7页 Computer Engineering and Applications
基金 国家自然科学基金(No.61672263)
关键词 不平衡分类 Lévy分布 过采样 人工合成过采样技术(SMOTE) imbalanced classification Lévy distribution oversampling Synthetic Minority Oversampling Technique(SMOTE)
  • 相关文献

参考文献5

二级参考文献31

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2吴洪兴,彭宇,彭喜元.适用于不平衡样本数据处理的支持向量机方法[J].电子学报,2006,34(B12):2395-2398. 被引量:16
  • 3GYONGYI Z, GARCIA-MOLINA H. Web spam taxonomy [ C]// Proceedings of the 14st International Workshop on Adversarial Information Retrieval on the Web. Chiba, Japan: AIRWeb, 2005:39-47.
  • 4EIRON N, MCCURLEY K S. Analysis of anchor text for Web search [ C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003:459-460.
  • 5SPIRIN N, HAN J. Survey on Web spam detection: principles and algorithms [ J]. ACM SIGKDD Explorations Newsletter, 2012, 13 (2): 50-64.
  • 6CHANDRA A, SUAIB M. A survey on Web spam and spam 2.0 [ J]. International Journal of Advanced Research in Computer Science, 2014,4(15) : 634 -644.
  • 7PRIETO V M, ALVAREZ M, CACHEDA F. SAAD, a content based Web spam analyzer and detector [ J]. Journal of Systems and Software, 2013, 86(11) : 2906 - 2918.
  • 8SCARSELLI F, TSOI A C, HAGENBUCHNER M, et al. Solving graph data issues using a layered architecture approach with applications to Web spam detection [ J]. Neural Networks, 2013, 48(1) : 78 - 90.
  • 9GAO S, ZHANG H, ZHENG X, et al. Improving SVM classifiers with link structure for Web spam detection [ J]. Journal of Computational Information Systems, 2014, 10(6) :2435 -2443.
  • 10BREIMAN L. Random forests-- random features [J]. Machine Learning, 1999, 45 ( 1 ) : 5 - 32.

共引文献81

同被引文献3

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部