摘要
在多标签分类中,标签之间的相关关系是一个重要的因素。为了利用标签之间的相关关系,文章提出了一种基于相关信息熵的多标签分类算法,使用相关信息熵来衡量标签之间相关关系的强弱程度。首先找出相关信息熵值最大的k标签组合的集合,然后使用LP(Label Powerset)分类器对每一个标签组合进行训练。在7个不同实验数据集上的实验结果表明:文中提出的算法的分类性能在其中的大部分数据集上优于其它对比的分类算法,而其它对比的分类算法仅在某一个数据集上优于文中提出的算法。
In our opinion, the LP( label powerset) classifier may put the uncorrelated labels into the label set and train it as a single label. To solve this problem, it is very necessary to make use of the correlations among multiple labels in carrying out multi-label classification. Therefore, we propose a multi-label classification algorithm using correlation information entropy (MLCACIE) for measuring the strength of label correlation. Its core consists of: ( 1 ) given the number of classifiers (CN) to be trained, we find out the CN number of subsets of k-labels with the strongest correlation; (2) we train these k-label subsets one by one with the CN number of LP classifiers. Finally, we use seven experimental datasets and the decision tree as the base classifier to perform experiments on the MLCA- CIE and compare it with other classification algorithms. The experimental results, given in Table 3, and their anal- ysis show preliminarily that : ( 1 ) ourMLCACIE outperforms other classification algorithms on most datasets because it makes use of the correlations among multiple labels in performing multi-label classification, while the other classi- fication algorithms outperform our MLCACIE only on one of the seven datasets; (2) the use of the correlations a- mong multiple labels can enhance the multi-label classification performance.
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2012年第6期968-973,共6页
Journal of Northwestern Polytechnical University
基金
国家科技重大专项(2012ZX03005007)资助
关键词
多标签分类
数据处理
相关信息熵
相关关系
algorithms, classification ( of information), correlationpy, information theory, labels
correlation informationtheory, data processing, decision trees, entro-entropy, multi-label classification