摘要
传统的机器学习方法在处理类别不平衡数据时分类性能较低,为此提出一种基于类别不平衡数据的层次分类模型.层次分类模型采用AdaBoost方法为基准分类器,以分类器误报率和特征建立数学模型,并证明层次分类模型的参数可以计算得到.首先以层次分类树为结构建立模型,接着针对层次分类树的结构模型进行分类代价计算,得到模型的代价与每层特征之间的定量数学描述,然后将该分类代价转换为优化问题并给出优化问题的求解过程,同时给出层次分类模型的计算结果.在UCI数据集上进行大量测试,以AUC和F-Measure为评价标准,相比于现有的不平衡分类方法,层次分类模型具有更优的分类性能.
Traditional machine learning methods have lower classification performance when dealing with class imbalanced data. A hierarchical classification model for class imbalanced data was thus proposed. With an AdaBoost classifier as its basis classifier, the model builds mathematical models by the features and false positive rates of the classifier, and demonstrates that parameters of the hierarchical classification model could be calculated. First, the hierarchical classification tree was as the structure, and then the classification cost of the hierarchical classification tree mode was obtained as well as a quantitative and mathematical description of the features of each layer. Finally, the classification cost could be converted to a optimization problem, and the solving process of the optimization problem was given. Meanwhile,results of the hierarchical classification are presented. Experiments have been conducted on UCI dataset, and the results show that the proposed method has higher AUC and F-measure compared to many existing class-imbalanced learning methods.
基金
国家科技支撑计划(2012BAH17B03)
安徽省自然科学基金(1408085MF131)
安徽省高等学校自然科学项目(KJ2013B212)
合肥师范学院魂芯DSP产业化研究院开放课题资助
关键词
机器学习
类别不平衡
层次分类
特征
评价标准
machine learning
class-imbalanced
hierarchical classification
feature
evaluation criteria