摘要
层次多标签分类方法,依据标签之间的相关性组织成层次结构,并将这种层次结构作为一种监督信息,从而更好地解决多标签分类问题.在层次多标签分类问题中常用的方法有两种,一种可称为损失无关方法,另一种可称为损失敏感方法.对于损失敏感方法,常用的损失函数有HMC-loss,该损失函数可对假正和假负两种错误给予不同的权重,并将层次信息添加到损失函数当中.当利用HMC-loss预测时,尽管得到的损失值是理想的,但实际预测的标签数却远多于真实的标签数.另外,层次信息的引入会对标签结点的决策顺序产生不利影响.针对这些问题,首先提出改进的损失函数IMH-loss,其次使用贝叶斯决策理论,提出了一种贝叶斯风险随决策过程可变的层次多标签分类方法.在真实数据集上的实验结果表明,该方法在保证召回率的同时,提升了标签预测精度.
Hierarchical multilabel classification(HMC)method organizes labels into a hierarchical structure based on the correlation among the labels which can be as a kind of supervised information,so that to better solve the multilabel classification problem.There are two commonly used methods in hierarchical multilabel classification problem.One can be called loss independent method,which does not use any loss function in training model and prediction process.The other is called loss sensitive method.For loss sensitive method,a frequently-used loss function in HMC is HMC-loss,which assigns two kinds of errors of false positive and false negative with different weights.At the same time,the hierarchical information is added to the loss function according to the location in the hierarchy.In the prediction process by using HMC-loss,although the loss value is ideal,the number of predicted positive labels are farmore than the actual label number.In addition,introducing hierarchy information into HMC-loss may have a negative effect to the decision order of label nodes.To solve these problems,we firstly propose an improved loss function IMH-loss(Improved Hierarchical loss)which deletes the hierarchical information so that the decision order of the nodes is guaranteed.By using Bayesian decision theory,we then propose a hierarchical multilabel classification method which can change Bayes risk along with the decision process.The experimental results on some real-world data sets show that the presented method can improve the predicted accuracy of labels while ensuring the recall rate and the prediction results is closer to the real results.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2017年第6期1023-1032,共10页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61632011
61272095
61432011
U1435212
61573231
61672331)
关键词
层次分类
多标签分类
可变贝叶斯风险
贝叶斯决策理论
hierarchical classification, multilabel classification, variable Bayes risk,Bayes decision theory