摘要
传统的多标记学习是监督意义下的学习,它要求获得完整的类别标记.但是当数据规模较大且类别数目较多时,获得完整类别标记的训练样本集是非常困难的.因而,在半监督协同训练思想的框架下,提出了基于Tri-training的半监督多标记学习算法(SMLT).在学习阶段,SMLT引入一个虚拟类标记,然后针对每一对类别标记,利用协同训练机制Tri-training算法训练得到对应的分类器;在预测阶段,给定一个新的样本,将其代入上述所得的分类器中,根据类别标记得票数的多少将多标记学习问题转化为标记排序问题,并将虚拟类标记的得票数作为阈值对标记排序结果进行划分.在UCI中4个常用的多标记数据集上的对比实验表明,SMLT算法在4个评价指标上的性能大多优于其他对比算法,验证了该算法的有效性.
Traditional multi-label learning is in the sense of supervision , in which the complete category labels arerequired.However, when the size of data is large and there are several categories of labels , it is quite difficult toobtain the training sample sets with complete labels .Therefore, a semi-supervised multi-label learning algorithmbased on Tri-training (SMLT) is proposed.In the learning stage, SMLT initially introduces a virtual label, then foreach pair of virtual labels, the Tri-training algorithm is utilized to train the corresponding classifiers for each pair oflabels.In the forecast stage, a new sample is given, which will be substituted into the obtained classifier describedabove.According to the votes of each label, the multi-label learning problem is transformed into a label rankingproblem, subsequently; the votes of the virtual label are taken as the threshold for distinguishing the label rankingresults.The contrast experiments on four commonly used UCI multi -label datasets show the SMLT algorithm behavesbetter than other comparative algorithms in four evaluation indices and the effectiveness of the proposed algorithm isverified.
出处
《智能系统学报》
CSCD
北大核心
2013年第5期439-445,共7页
CAAI Transactions on Intelligent Systems
基金
国家"973"计划前期研究专项(2011CB311805)
山西省科技攻关计划资助项目(20110321027-01)
山西省科技基础条件平台建设项目(2012091002-0101)