期刊文献+

基于逻辑回归的不平衡数据算法适用性研究 被引量:1

Research on the Applicability of Unbalanced Data Algorithm Based on Logistic Regression
下载PDF
导出
摘要 逻辑回归模型容易受到不平衡数据的影响,本文主要探究了随机欠采样法、Border Line-Smote (BLS)过采样法、自适应综合过采样法(Synthetic Minority Oversampling Technique)等三种不平衡数据算法对逻辑回归模型的适用情况。利用逻辑回归模型分别对三种方法平衡之后的数据,处理之后发现BLS过采样法得出的各项指标最优,ADASYN过采样法得出的各项指标最差,最终得出BLS过采样法更适用于逻辑回归模型的不平衡数据集的处理。 The logistic regression model is susceptible to the impact of unbalanced data. This paper mainly explores the applicability of three kinds of unbalanced data algorithms, including stochastic under-sampling, Border Line-Smote oversampling (BLS) method, and Synthetic Minority Over-sampling Technique, to the logistic regression model. By using logistic regression model to process the balanced data of the three methods, it was found that the indicators obtained by BLS over-sampling method were the best and the indicators obtained by ADASYN over-sampling method were the worst. Finally, it was concluded that BLS oversampling method was more suitable for the processing of unbalanced data sets of logistic regression model.
作者 李超杰 温磊
出处 《计算机科学与应用》 2020年第11期2049-2057,共9页 Computer Science and Application
关键词 逻辑回归 随机欠采样法 BSL过采样法 ADASYN过采样法 Logistic Regression Random Over-Sampling Border Line-Smote Method ADASYN Method
  • 相关文献

参考文献7

二级参考文献95

  • 1肖春景,张敏.基于减法聚类与模糊c-均值的模糊聚类的研究[J].计算机工程,2005,31(B07):135-137. 被引量:22
  • 2HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 3EZAWA K J, SINGH M, NORTON S W. Learning goal oriented Bayesian networks for telecommunications management [ C ]//Proc of the 13th International Conference on Machine Learning. San Fransisco: Morgan Kaufmann, 1996:139-147.
  • 4CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE:synthetic minority over-sampling technique[ J ]. Journal of Artificial Intelligence Research, 2002,16:321-357.
  • 5KUBAT M, HOLTE R, MATWIN S. Machine learning for the detection of oil spills in satellite radar images [ J ]. Machine Learning, 1998,30(2) :195-215.
  • 6BOSCH A T, HERIK H J, DAELEMANS W. When small disjuncts abound, try lazy learning: a case study[ C ]//Proc of the 7th Belgian- Dutch Conference on Machine Learning. 1997 : 109-118.
  • 7ZHENG Zhao-hui, WU Xiao-yun, SRIHARI R. Feature selection for text categorization on imbalanced data[ J ]. SIGKDD Explorations, 2004,6( 1 ) :80-89.
  • 8FAWCETT T, PROVOST F. Combining data mining and machine learning for effective user profile [ C ]//Proc of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996:8-13.
  • 9JAPKOWICZ N. Learning form imbalanced data sets : a comparison of various strategies, WS-00-05 [ R]. Menlo Park: AAAI Press, 2000.
  • 10CHAWLA N V, JAPKOWICZ N, KOLCZ A. Proceedings of the ICML workshop on learning from imbalanced data sets[ C]. 2003.

共引文献57

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部