期刊文献+

BOS:一种用于不平衡数据学习的边界过采样方法 被引量:3

BOS:a borderline over-sampling method for imbalanced data learning
原文传递
导出
摘要 不平衡数据遍布于现实生活中许多重要领域,而标准的分类学习算法应对不平衡问题有明显的性能缺陷.为了解决这一问题,提出一种新的少数类边界合成过采样方法BOS.BOS使用新定义的K广义Tomek连接(简称K连接)概念有效定位边界实例,进而基于少数类的K连接分布实现自适应地少数边界合成过采样.实验结果表明,BOS相比已有的几种典型过采样方法提供更优的接受者操作特性曲线下方面积值(AUC),F值(F-Measure)和几何平均值(G-mean). The imbalance data are pervasive in a large number of realworld domains of great importance. Traditional classification learning algorithms behave undesirable in imbalanced problem. To address this problem,the authors proposed a new synthetic minority borderline synthetic oversampling method named as BOS. In BOS, a novel K generalized Tomek links concept was used to locate minority class borderline instances, and adaptively generating minority instances were implemented base on the number of their K links. Experimental results show that BOS performed better than some existing typical methods, with more excellent FMeasure, Gmean and the area under the ROC(AUC) values.
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2012年第3期553-559,共7页 Journal of Sichuan University(Natural Science Edition)
基金 食品中抗生素类药物残留评估的化学与生物信息学方法探索(21175095) 基于抗癌药物及其靶标蛋白相互作用的层次网络研究(20972103)
关键词 不平衡问题 K广义的Tomek连接 少数类边界合成过采样 imbalanced problem K generalized Tomek links minority class borderline synthetic oversampling
  • 相关文献

参考文献30

  • 1陈黎,李志蜀,琚生根,唐小棚,梁时木,韩国辉.基于SVM预测的金融主题爬虫[J].四川大学学报(自然科学版),2010,47(3):493-497. 被引量:7
  • 2Japkowicz N, Stephen S. The class imbalance prob- lem: A systematic study [J]. Intelligent Data Analy- sis, 2002, 6(5): 203.
  • 3Jo T, Japkowicz N. Class imbalances versus small disjuncts [J]. SIGKDD Explorations, 2004, 6(1): 40.
  • 4Prati R C, Batista G E A P A, Monard M C, et al. Class imbalances versus class overlapping: an analy- sis of a learning system behavior [C]. Heidelberg: Springer, 2004.
  • 5Dietterich T G, Kerns M, Mansour Y. Applying theweak learning framework to understand and improve C4. 5 [C]. San Francisco: Morgan Kaufmann, 1996.
  • 6Cieslak D A, Chawla N V. Learning decis:on trees for unbalanced data [C]. Antwerp: Springer, 2008.
  • 7Quinlan J R. Improved estimates for the accuracy of small disjuncts[J]. Machine Learning, 1991, 6(1) 93.
  • 8Lin Y, Lee Y K, Wahba G. Support vector machines for classification in nonstandard situations [J ]. Machine Learning, 2002, 46(1/3): 191.
  • 9Freund Y, Schapire R E. Experiments with a new boosting algorithm [C]. Bari: Morgan Kaufmann, 1996.
  • 10Sun Y, Kamel M S, Wong A K C, et al. Cost-sen- sitive boosting for classification of imbalanced data [J].Pattern Recognition, 2007, 40(12) : 3358.

二级参考文献9

  • 1祝宇,夏诏杰,聂峰光,郭力.支持向量机在化学主题爬虫中的应用[J].计算机与应用化学,2006,23(4):329-332. 被引量:8
  • 2Chakrabarti S,Dom B,Indyk P.Enhanced hypertext categorization using hyperlinks[C].New York:ACM,1998:3072318.
  • 3Johnson J,Tsioutsioul I I K,Giles C L.Evolving strategies for focused Web crawling[C].Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003).Washington DC:[s.n.],2003.
  • 4Gautam Pant,Padmini Srinivasan.Learning to crawl:comparing classification schemes[J].ACM Transactions on Information Systems,2005,23:4302462.
  • 5Pant G,Tsioutsiouliklis K,Johnson J,et al.Panorama:Extending digital libraries with topical crawlers[C].New York:[s.n.],2004.
  • 6Diligenti M,Coetzee F,Lawrence S,et al.Focused crawling using context graphs[C].Egypt:Cairo,2000:527.
  • 7Johnson J,Tsioutsiouliklis K,Giles C L.Evolving strategies for focused web crawling[C].Washington DC:[s.n.],2003.
  • 8Chakrabarti S,Van Den Berg M,Dom B.Focused crawling:a new approach to topic-specific Web resource discovery[J].Computer Networks,1999,31:1623.
  • 9李颖,李志蜀,邓欢.基于Lucene的中文分词方法设计与实现[J].四川大学学报(自然科学版),2008,45(5):1095-1099. 被引量:13

共引文献6

同被引文献18

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部