期刊文献+

不平衡数据的关键因素筛选方法 被引量:3

Research on Variable Selection Methods of Imbalanced Data
下载PDF
导出
摘要 数据不平衡问题的存在,使得模型倾向于将测试样本判别为多数类,导致少数类的分类效果较差。可以从数据和算法两个角度解决数据不平衡带来的问题,本研究主要关注关键因素筛选时不平衡问题的处理,在数据层面使用基于SMOTE抽样的Group Lasso,算法层面使用了调节阈值的Group Lasso,包括分步调节参数和同时调节参数两种方法。最后在307例亚健康患者的问卷数据上使用三种方法建立"肝郁脾虚"诊断模型。从得到的结果来看,基于SMOTE的方法和同时调参的方法得到模型预测效果在灵敏度和特异度上较好。 The existence of data imbalance makes a model tends to predict samples as majority class, resulting in a poor classification effect. The problem of data imbalance can be solved from two aspects of data and algorithm. This research mainly focused on processing imbalance problem in variable selection. In the aspect of data, Group Lasso logistic based on SMOTE sampling was used. In the aspect of algorithm, the Group Lasso with threshold adjusting which include adjusting the parameters step by step and adjusting the parameters simultaneously were used. Finally, the diagnosis model of"liver depression and spleen deficiency"in 307 sub-health patients. questionnaire data was established by three methods. The results showed that the method based on SMOTE and method of simultaneous parameter adjustment have a better prediction in accuracy and sensitivity.
作者 贾萍萍 李扬 Jia Pingping;Li Yang(Center for Applied Statistics of Renmin University of China,Beijing 100872,China;School of Statistics,Rerunin University of China,Beijing 100872,China)
出处 《世界科学技术-中医药现代化》 CSCD 北大核心 2019年第3期389-394,共6页 Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基金 国家教育部人文社会科学重点研究基地重大项目(16JJD910002):基于大数据的精准医学生物统计分析方法及其应用研究,负责人:许王莉 国家自然科学基金委青年基金项目(11401013):基于函数型数据分析的联合统计建模:理论与应用,负责人:黄辉 中国人民大学2017年度中央高校建设世界一流大学(学科)和特色发展引导专项资金,负责人:赵彦云
关键词 不平衡数据 SMOTE抽样 预测阈值 成组 Lasso Imbalanced data SMOTE prediction threshold group Lasso
  • 相关文献

参考文献4

二级参考文献27

  • 1方敏.集成学习的多分类器动态融合方法研究[J].系统工程与电子技术,2006,28(11):1759-1761. 被引量:12
  • 2韩慧,王文渊,毛炳寰.不均衡数据集中基于Adaboost的过抽样算法[J].计算机工程,2007,33(10):207-209. 被引量:13
  • 3Cen Li. Classifying Imbalanced Data Using a Bagging EnsembleVariation(BEV)[C]//Proc. of the 45th ACM Annual Southeast Regional Conference. Winston-Salem, USA: ACM Press, 2007.
  • 4Zhu Xingquan. Lazy Bagging for Classifying Imbalanced Data[C]// Proc. of ICDM'07. Omaha, Nebraska, USA: IEEE ComputerSociety, 2007: 763-768.
  • 5Chawla N, Bowyer K, Hall L, et al. SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial IntelligenceResearch, 2002, 16(2): 321-357.
  • 6Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
  • 7Estabrooks.A. A Combination Scheme for Inductive Learning from Imbalanced Data Sets[M].Dalhousie University,2000.
  • 8Chawla.N,Bowyer,K.W,Hall,L.O. Smote:Synthetic Minority Over-Sampling Technique[J].Journal of Artificial Intelligence Research,2002,(03):321-357.
  • 9Hui,HAN,Wen-yuan,WANG,Bing-huan,MAO. Borderline-Smote:A New Over-Sampling Method in Imbalanced Data Sets Learning[A].Beilin:Springer-Verlag,2005.878-887.
  • 10TAEHO J,NATHALIE J. Class imbalances versus small disjuncts[J].ACM SIGKDD Explorations,2004,(01):40-49.

共引文献70

同被引文献49

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部