摘要
多标签分类在现实世界中有着广泛的应用,是当今机器学习领域的热点问题之一。多标签分类的代表性算法BR(Binary Relevance)虽有较多的改进工作,但大都仅针对标签相关性或特征选择中某一个方面进行改进,因此现有改进算法的性能仍存在提升空间。针对上述现状,论文提出一种基于特征选择和标签相关性的多标签分类算法,该算法先使用信息增益为每个标签选择与其相关的特征属性,而后采用新的控制结构的方式考虑标签相关性,最后使用新的特征集合为每个标签训练二分类器。在6个基准数据集上的实验结果表明,该算法在5种不同评价指标下的表现优于其它典型的BR改进算法。
Multi-label classification has a wide range of applications in the real world and is one of the hot issues in the field of machine learning today.Although there are many improvements on BR(Binary Relevance)which is the representative algorithm of multi-label classification,most of them only improve on one aspect of label correlation and feature selection,these existing improved algorithms still have room for improvements.In view of this,this paper proposes a multi-label classification algorithm based on feature selection and label correlation.Firstly,the algorithm uses the information gain to select the feature attributes associated with each label,then considers the label correlation in a stacking structure,and finally uses the new feature set to train the binary classifiers for each label.Experiments on six benchmark datasets show that the algorithm is superior to other typical BR improvement algorithms under five different evaluation indicators.
作者
蔡剑
牟甲鹏
余孟池
徐建
CAI Jian;MU Jiapeng;YU Mengchi;XU Jian(School of Computer Science&Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处
《计算机与数字工程》
2021年第10期1967-1972,1997,共7页
Computer & Digital Engineering
关键词
分类
多标签学习
特征选择
标签相关性
classification
multi-label learning
feature selection
label correlation