一种基于相关信息熵的多标签分类算法被引量：3

A Multi-Label Classification Algorithm Using Correlation Information Entropy

下载PDF

导出

摘要在多标签分类中,标签之间的相关关系是一个重要的因素。为了利用标签之间的相关关系,文章提出了一种基于相关信息熵的多标签分类算法,使用相关信息熵来衡量标签之间相关关系的强弱程度。首先找出相关信息熵值最大的k标签组合的集合,然后使用LP(Label Powerset)分类器对每一个标签组合进行训练。在7个不同实验数据集上的实验结果表明:文中提出的算法的分类性能在其中的大部分数据集上优于其它对比的分类算法,而其它对比的分类算法仅在某一个数据集上优于文中提出的算法。 In our opinion, the LP（ label powerset） classifier may put the uncorrelated labels into the label set and train it as a single label. To solve this problem, it is very necessary to make use of the correlations among multiple labels in carrying out multi-label classification. Therefore, we propose a multi-label classification algorithm using correlation information entropy （MLCACIE） for measuring the strength of label correlation. Its core consists of：（ 1 ） given the number of classifiers （CN） to be trained, we find out the CN number of subsets of k-labels with the strongest correlation; （2） we train these k-label subsets one by one with the CN number of LP classifiers. Finally, we use seven experimental datasets and the decision tree as the base classifier to perform experiments on the MLCA- CIE and compare it with other classification algorithms. The experimental results, given in Table 3, and their anal- ysis show preliminarily that ：（ 1 ） ourMLCACIE outperforms other classification algorithms on most datasets because it makes use of the correlations among multiple labels in performing multi-label classification, while the other classi- fication algorithms outperform our MLCACIE only on one of the seven datasets; （2） the use of the correlations a- mong multiple labels can enhance the multi-label classification performance.

作者张振海李士宁李志刚

机构地区西北工业大学计算机学院

出处《西北工业大学学报》 EI CAS CSCD 北大核心 2012年第6期968-973,共6页 Journal of Northwestern Polytechnical University

基金国家科技重大专项(2012ZX03005007)资助

关键词多标签分类数据处理相关信息熵相关关系 algorithms, classification （ of information）, correlationpy, information theory, labels correlation informationtheory, data processing, decision trees, entro-entropy, multi-label classification

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献14

1Grigorios T, Ioannis V. Mining Multi-Label Data. Data Mining and Knowledge Discovery Handbook, 2010, 2nd edition.
2Zhang M L, Zhou Z H. Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Transactions on Knowledge and Data Engineering, 2006, 18 (10) : 1338-1351.
3Andre E, Jason W. A Kernel Method for Multi-Labelled Classification. Advances in Neural Information Processing Systems, 2002, 14:681-687.
4Francesco D C, Remi G, Marc T. Learning Multi-Label Alternating Decision Trees from Texts and Data. Lecture Notes in Com- puter Science 2734, 2003, 35-49.
5Johannes F K, Eyke H. Muhilabel Classification via Calibrated Label Ranking. Machine Learning, 2008, 73 (2) :133-153.
6Ji S W, Tang L. Extracting Shared Subspaces for Multi-Label Classification. KDD 2008:14th ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining, 2008, 381-389.
7Jesse R, Bernhard P. Classifier Chains for Multi-Label Classification. Machine Learning, 2011, 85 (3) :333-359.
8Dembczynski K, Cheng W W. Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains. Proc ICML, 2010, 279 -286.
9Grigorios T, Ioannis V. Random k-Labelsets for Multilabel Classification. IEEE Transactions on Knowledge and Data Engineer- ing, 2011, 23(7) :1079-1059.
10Wang Q, Shen Y, Zhang J Q. A Nonlinear Correlation Measure for Multivariable Data Set. Physica D:Nonlinear Phenomena,2005,200 (3/4) : 287 -295.

同被引文献30

1Witten I H,Frank E.数据挖掘实用机器学习技术[M].北京:机械工业出版社,2006
2Han J,Mickeline K,Pel J.数据挖掘:概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2012.
3Kira K,Rendell L A.A practical approach to feature selection[C]//In Machine Learning Proceedings of the Ninth International Conference.San Francisco:Morgan Kaufmann,1992:250-256.
4Modrzejewski M.Feature selection using rough sets theory[C]//European Conference on Machine Learning.Berlin:Springer Verlag,1993:213-226.
5Liu H,Setiono R.A probabilistic approach to feature selection-a filter solution[C]//Proceedings of International Conference on Machine Learning.San Francisco:Morgan Kaufmann,1996:419-424.
6Hall M A.Correlation-based feature selection for machine learning[D].Hamilton:The University of Waikato,1999.
7Hall M A.Correlation-based feature selection for discrete and numeric class machine learning[C]//the 17th International Conference on Machine Learning.San Francisco:Morgan Kaufmann,2000:359-366.
8UCI机器学习库[EB/OL].[2013-4-11].http://archive.ics.uci.edu/ml/datasts.html/.
9Boutell M R, Luo Jiebo, Shen Xipeng, et al. Learning multi-label scene classification[J] . Pattern Recognition, 2004, 37(9):1757-1771.
10Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data[M] //Maimon O, Rokach L. Data Mining and Knowledge Discovery Handbook. Berlin:Springer, 2010:667-686.

引证文献3

1魏浩,丁要军.一种基于相关的属性选择改进算法[J].计算机应用与软件,2014,31(8):280-284. 被引量：7
2刘卓然,胡杨,刘骊,冯旭鹏,刘利军,黄青松.基于标签相似度的不良信息多标签分类方法[J].计算机应用研究,2016,33(4):989-992. 被引量：8
3程玉胜,赵大卫,钱坤.近邻标签空间非平衡化标签补全的多标签学习[J].模式识别与人工智能,2018,31(8):740-749. 被引量：4

二级引证文献19

1李琼阳,田萍.基于主成分分析的朴素贝叶斯算法在垃圾短信用户识别中的应用[J].数学的实践与认识,2019,49(1):134-138. 被引量：7
2王行甫,杜婷.基于属性选择的改进加权朴素贝叶斯分类算法[J].计算机系统应用,2015,24(8):149-154. 被引量：21
3杨立洪,李琼阳,李兴耀.基于信息值的相关属性约减——加权二分类朴素贝叶斯算法研究[J].统计与决策,2018,0(2):23-26. 被引量：9
4樊强.大数据环境下安全信息优化保护仿真[J].计算机仿真,2018,35(6):176-179. 被引量：4
5宁琳,孙艳红.多媒体网络不良信息过滤方法仿真[J].计算机仿真,2018,35(7):343-346. 被引量：1
6曹再辉,吴庆涛,施进发.基于低秩和图拉普拉斯的属性选择算法[J].计算机工程与应用,2018,54(17):110-115. 被引量：1
7韩栋,王春华,肖敏.结合旋转森林和Ada Boost分类器的多标签文本分类方法[J].计算机应用研究,2018,35(12):3655-3658. 被引量：10
8李昌群,杨静,程文娟,安宁.h-MMHC算法及其在主因素分析中的应用[J].计算机应用与软件,2016,33(6):240-245.
9姚哲,陶剑文.多源适应多标签分类框架[J].计算机工程与应用,2017,53(7):88-96. 被引量：24
10齐权.基于机器学习的中职学校流生预测研究[J].中国教育信息化,2018,24(23):28-31.

1张振海,李士宁,李志刚.相关近似熵及在传感网数据故障检测中的应用[J].华中科技大学学报（自然科学版）,2016,44(2):86-91. 被引量：2
2万定生,胡玉婷,任翔.带反馈输入BP神经网络的应用研究[J].计算机工程与设计,2010,31(2):398-400. 被引量：10
3范雪莉,冯海泓,原猛.基于互信息的主成分分析特征选择算法[J].控制与决策,2013,28(6):915-919. 被引量：103
4李爱国,汪保男.一种非线性新相关信息熵定义及其性质、应用[J].信息与控制,2011,40(3):401-407. 被引量：12
5陈莹,朱明,李兆泽.基于高斯混合模型的遥感数字图像增强[J].中国激光,2014,41(12):223-229. 被引量：12
6高娟,王国胤,胡峰.多类别肿瘤基因表达谱的自动特征选择方法[J].计算机科学,2012,39(10):193-197. 被引量：1
7陈莹,朱明,刘剑,李兆泽.高斯混合模型自适应微光图像增强[J].液晶与显示,2015,30(2):300-309. 被引量：9
8潘期辉.基于虚拟技术的信息管理实验平台构建分析[J].计算机光盘软件与应用,2013,16(5):289-290.
9张旭,苏莉雅.一种新的学科贡献值计算方法[J].广西物理,2015,36(3):41-43. 被引量：1
10陈桂慧,郑华,刘华锐.基于NS2的高密度WLAN同频干扰[J].计算机系统应用,2015,24(2):206-210.

西北工业大学学报

2012年第6期

浏览历史

内容加载中请稍等...

一种基于相关信息熵的多标签分类算法被引量：3

参考文献14

同被引文献30

引证文献3

二级引证文献19

相关作者

相关机构

相关主题

浏览历史

一种基于相关信息熵的多标签分类算法 被引量：3

参考文献14

同被引文献30

引证文献3

二级引证文献19

相关作者

相关机构

相关主题

浏览历史

一种基于相关信息熵的多标签分类算法被引量：3