摘要
多标签分类的实质就是为给定实例预测一个与其关联的标签集合。典型方法可以分为两类:问题转换型和算法适应型。本文主要研究基于标签幂集的问题转换型算法。由于已有的标签幂集算法很难发现甚至可能忽略隐藏在训练集中的重要标签集合,因此,本文提出了一种基于标签聚类的标签幂集方法,通过改进平衡k-means聚类来发现训练集中潜在的重要标签集合,并用于形成新的训练集进行多标签分类。经实验验证,该算法在多个评价指标上较原有的标签幂集方法具有更好的分类性能。
The essence of a multi-label classifier is to assign a set of labels to a given instance. There are the two classical methods: problem transformation and algorithm adaptation. This paper mainly explores the problem transformation of label powerset. By analyzing existing label powerset methods, we find out that it is easy for them to underutilize multi label information. Therefore, this paper proposed a novel label powerset method based on label clustering. Firstly, it identifies unseen multilabels by improving balanced k-means clustering. Then based on that unseen multilabels, it forms new training data for multi-label classification. The experimental results show that the new method has competitive performance with respect to multiple evaluation metrics.
出处
《软件》
2014年第8期16-21,共6页
Software
基金
北京市自然科学基金资助(4142042)
关键词
多标签
分类器
标签聚类
标签集合
Multi Label
Classifier
Label Clustering
Label Set