期刊文献+

一种基于类别不平衡数据的层次分类模型 被引量:4

A hierarchical classification model for class-imbalanced data
下载PDF
导出
摘要 传统的机器学习方法在处理类别不平衡数据时分类性能较低,为此提出一种基于类别不平衡数据的层次分类模型.层次分类模型采用AdaBoost方法为基准分类器,以分类器误报率和特征建立数学模型,并证明层次分类模型的参数可以计算得到.首先以层次分类树为结构建立模型,接着针对层次分类树的结构模型进行分类代价计算,得到模型的代价与每层特征之间的定量数学描述,然后将该分类代价转换为优化问题并给出优化问题的求解过程,同时给出层次分类模型的计算结果.在UCI数据集上进行大量测试,以AUC和F-Measure为评价标准,相比于现有的不平衡分类方法,层次分类模型具有更优的分类性能. Traditional machine learning methods have lower classification performance when dealing with class imbalanced data. A hierarchical classification model for class imbalanced data was thus proposed. With an AdaBoost classifier as its basis classifier, the model builds mathematical models by the features and false positive rates of the classifier, and demonstrates that parameters of the hierarchical classification model could be calculated. First, the hierarchical classification tree was as the structure, and then the classification cost of the hierarchical classification tree mode was obtained as well as a quantitative and mathematical description of the features of each layer. Finally, the classification cost could be converted to a optimization problem, and the solving process of the optimization problem was given. Meanwhile,results of the hierarchical classification are presented. Experiments have been conducted on UCI dataset, and the results show that the proposed method has higher AUC and F-measure compared to many existing class-imbalanced learning methods.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2015年第1期61-68,共8页 JUSTC
基金 国家科技支撑计划(2012BAH17B03) 安徽省自然科学基金(1408085MF131) 安徽省高等学校自然科学项目(KJ2013B212) 合肥师范学院魂芯DSP产业化研究院开放课题资助
关键词 机器学习 类别不平衡 层次分类 特征 评价标准 machine learning class-imbalanced hierarchical classification feature evaluation criteria
  • 相关文献

参考文献25

  • 1Phua C, Alahakoon D, Lee V. Minority report in fraud detection., classification of skewed data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1) : 50-59.
  • 2Sun A X, Lim E P, Liu Y. On strategies for imbalanced text classification using SVM.- A comparative study [J]. Decision Support Systems, 2009, 48(1): 191-201.
  • 3Turney P D. Learning algorithms for key phrase extraction[J]. Information Retrieval, 2000, 2 (4): 303-336.
  • 4Burez J, van den Poel D. Handling class imbalance in customer churn prediction[J]. Expert Systems with Applications, 2009, 36(3): 4 626-4 636.
  • 5Brekke C, Solberg A H S. Oil spill detection by satellite remote sensing [J]. Remote sensing of environment, 2005, 95(1): 1-13.
  • 6Plant C, Bohm C, Tilg B, et al. Enhancing instance- based classification with local density: a new algorithm for classifying unbalanced biomedical data [J ]. Bioinformatics, 2006, 22(8): 981-988.
  • 7Branch J W, Giannella C, Szymanski B, et al. In-network outlier detection in wireless sensor networks [J ]. Knowledge and information systems, 2013, 34(1): 23-54.
  • 8Sahbi H, Geman D. A hierarchy of support vector machines for pattern detection[J]. Journal of Machine Learning Research, 2006, 7: 2 087-2 123.
  • 9Blake C, Keogh E, Merz C J. UCI repository of machine learning databases [ EB/OL]. http://www. ics. uci. edu/_mlearn/MLRepository, html.
  • 10Chawla N V, Bowyer K W, Hall L O, et al. SMOTE.. Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.

同被引文献19

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部