摘要
针对目标可以同时属于多个类别的多标签分类问题,提出了一种基于浮动阈值分类器组合的多标签分类算法。首先,分析探讨了基于浮动阈值分类器的Ada Boost算法(Ada Boost.FT)的原理及错误率估计,证明了该算法能克服固定分段阈值分类器对分类边界附近点分类不稳定的缺点从而提高分类准确率;然后,采用二分类(BR)方法将该单标签学习算法应用于多标签分类问题,得到基于浮动阈值分类器组合的多标签分类方法,即多标签Ada Boost.FT。实验结果表明,所提算法的平均分类精度在Emotions数据集上比Ada Boost.MH、ML-k NN、Rank SVM这3种算法分别提高约4%、8%、11%;在Scene、Yeast数据集上仅比Rank SVM低约3%、1%。由实验分析可知,在不同类别标记之间基本没有关联关系或标签数目较少的数据集上,该算法均能得到较好的分类效果。
To solve the multi-label classification problem that a target belongs to multiple classes, a new multi-label classification algorithm based on floating threshold classifiers combination was proposed. Firstly, the theory and error estimation of the Ada Boost algorithm with floating threshold( Ada Boost. FT) were analyzed and discussed, and it was proved that Ada Boost. FT algorithm could overcome the defect of unstabitily when the fixed segmentation threshold classifier was used to classify the points near classifying boundary, the classification accuracy of single-label classification algorithm was improved. And then, the Binary Relevance( BR) method was introduced to apply Ada Boost. FT algorithm into multi-label classification problem, and the multi-label classification algorithm based on floating threshold classifiers combination was presented, namely multi-label Ada Boost. FT. The experimental results show that the average precision of multi-label Ada Boost. FT outperforms the other three multi-label algorithms, Ada Boost. MH( multiclass, multi-label version of Ada Boost based on Hamming loss), ML-k NN( Multi-Label k-Nearest Neighbor), Rank SVM( Ranking Support Vector Machine) about4%, 8%, 11% respectively in Emotions dataset, and is just little worse than Rank SVM about 3%, 1% respectively in Scene and Yeast datasets. The experimental analyses show that multi-label Ada Boost. FT can obtain the better classification results in the datasets which have small number of labels or whose different labels are irrelevant.
出处
《计算机应用》
CSCD
北大核心
2015年第1期147-151,共5页
journal of Computer Applications
基金
四川省科技支撑计划项目(2011GZ0171
2012GZ0106)
关键词
连续ADABOOST
浮动阈值
极大似然原理
多标签分类
集成学习
二分类方法
real Ada Boost
floating threshold
maximum likelihood principle
multi-label classification
ensemble learning
Binary Relevance(BR) method