基于多标签数据的降维与分类算法的研究

Research on Dimension Reduction and Classification Algorithm Based on Multi Label Data

下载PDF

导出

摘要现在为人们所熟知的是单标签的分类,传统的监督学习的方法主要应用在单标签的数据中,但随着数据的日益丰富,单标签已经不能再完整地描述一个样本的信息,现在往往一条样本会对应多个标签,所以多标签数据的分类逐渐的成为数据挖掘的一个重要研究方向。虽然多标签能够更好地去描述一个样本的信息,但多标签数据通常是那种特征数目很大的数据,对这样的数据直接进行处理很困难,同时这些高维数据往往存在维度灾难的问题,所以对多标签数据进行分类之前做好数据的降维对最终的分类起着不可忽视的作用。提出一种基于采用条件互信息(最小冗余最大依赖准则,MDMR)来进行特征集的选择,去除无用的特征信息,然后通过一种改进的KNN算法对数据进行分类,实验表明这种方法使平均查全率提高2.5%。 Now,is well known that the classification of a single label,the traditional method of supervised learning are used in data in a single label,but the increasing rich data,single-label can no longer complete description of a sample of the information,a sample often can corresponds more tags todays,so multi-label classification data gradually become an important research direction of data mining.While many labels to better information to describe a sample,multi-label data is usually characterized by a large number of the kind of data,so it is difficult to process such data directly,and these high-dimensional data while there is often the curse of dimensionality problem,Data before doing so multi-label data classification dimension reduction on the final classification and plays an essential role.Presents for this condition based on the use of mutual information（Minimum Redundancy AND Maximum Dependent） to select the feature set,removing useless features information,and then through an improved KNN algorithm for data classification,experimental results show that this method is that the average recall rate increased by 2.5%.

作者汤文伟于威威

机构地区上海海事大学

出处《现代计算机（中旬刊）》 2016年第5期3-9,共7页 Modern Computer

关键词单标签多标签条件互信息特征提取 KNN算法 Single-Label Multi-Label Conditional Mutual Information Feature Extraction KNN Algorithm

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献34

1Tsoumakas G, Katakis I. Mutilabel Classification: An Overview[J]. Data Warehousing and Mining,2007,3(3):1-13.
2Tsoumakes G, Katakis I, Vlahavas I. Mining Multilabel Data [M]. Data Mining and Knowledge Diseovery Handbook. New York Springer,2010.
3Boutell M R, Luo Jie-Bo, Shen Xi-Peng, Et A1. Learning Multitabel Scene Classification[J]. Pattern Recognition,2004,37 (9):1757- 1771.
4Zhang Yi, Burer S, Street W N. Ensemble Pruning Via Semidefinite Programming[J]. Maehine Learning Researeh,2006,7(12):1315- 1338.
5Blockeel H, Schietgat L, Stmyf J, et al. Decision Trees for Hierarchical Mulilabel Classification: A Cass Study in Functional Ge- nomics[M]. New York:Spring Berlin Heidelberg, 2006.
6Tsoumakas G, Vlahavas I. Random K-Labelsets: An Ensemble Method for Multilabel Classification: Machine Learning[M]. New York: Spring Berlin Heidelberg, 2007.
7Zhang Min-Ling, Zhou Zhi-Hua. Mutilabel Neural Networks with Application to Functional Genomics and Text Categorization[J]. IEEE Transactions On Knowledge And Data Engineering,2006,18(10):1338-1351.
8J.Lee, D.Kim, Feature Selection for Multi-Label Classification Using Multivariate Mutual Information, Pattern Recognit. Lett. 34(2013):349-359.
9J.Lee, D.Kim, Mutual Information-Based Multi-Label Feature Selection Using Interaction Information, Expert Syst. Appl. 42(2015) 2013-2025.
10J.Lee, D.Kim, Memetic Feature Selection Algorithm for Multi-Label Classification, Inf. Sci. 293(2015)80-96.

二级参考文献85

1LingZhang,BoZhang.A Quotient Space Approximation Model of Multiresolution Signal Analysis[J].Journal of Computer Science & Technology,2005,20(1):90-94. 被引量：19
2WANG Guo-yin HU Feng HUANG Hai WU Yu.A Granular Computing Model Based on Tolerance relation[J].The Journal of China Universities of Posts and Telecommunications,2005,12(3):86-90. 被引量：9
3Yiyu,（Y.Y.）,Yao.Three Perspectives of Granular Computing[J].南昌工程学院学报,2006,25(2):16-21. 被引量：19
4Shen X,Boutell M,Luo J,Brown C.Multi-label machine learning and its application to semantic scene classification//Proceedings of the 2004 International Symposium on Electronic Imaging.San Jose,California,USA,2004:18-22.
5Hullermeier E,Furnkranz J,Cheng W,Brinker K.Label ranking by learning pairwise preferences.Artificial Intelligence,2008,172(16):1897-1916.
6Read J.A pruned problem transformation method for multi-label classification//Proceedings of the New Zealand Computer Science Research Student Conference.New Zealand,2008:143-150.
7Tsoumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multilabel classification//Proceedings of the ECML.Warsaw,Poland,2007:406-417.
8Schapire R,Singer Y.BoosTexter:A boosting-based system for text categorization.Machine Learning,2000,39(2):135-168.
9Zhang M,Zhou Z.Multilabel neural networks with applications to functional genomics and text categorization.IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
10Zhang M,Zhou Z.A k-nearest neighbor based algorithm for multi-label classification//Proceedings of the IEEE International Conference on Granular Computing.Beijing,China,2005,2:718-721.

共引文献203

1喻金平,郑杰,朱桂祥.基于多关系网络的社区检测算法[J].系统仿真学报,2015,27(1):147-154. 被引量：1
2王国胤,张清华.不同知识粒度下粗糙集的不确定性研究[J].计算机学报,2008,31(9):1588-1598. 被引量：100
3阎高伟,谢刚,牛昱光,谢克明.KMEA算法及其在多传感器融合系统中的应用[J].计算机工程与应用,2008,44(26):25-29.
4王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009,32(7):1229-1246. 被引量：367
5张清华,周玉兰,滕海涛.基于粒计算的认知模型[J].重庆邮电大学学报（自然科学版）,2009,21(4):494-501. 被引量：22
6李鸿.粒计算的四面体模型[J].计算机工程与应用,2009,45(28):43-47. 被引量：5
7刘丽,殷国富,周长春,欧彦江,尚欣.基于知识粒度的移动Agent虚拟企业协作研究[J].高技术通讯,2009,19(10):1085-1091.
8姚晓昆,邱桃荣,白小明.近似本体获取的粒计算方法[J].南昌大学学报（理科版）,2009,33(6):595-598. 被引量：2
9王红霞,王志伟,程艳慧.一种基于粒计算属性约简算法的改进及应用[J].微计算机信息,2010,26(13):33-35. 被引量：1
10周红芳,宋姣姣,罗作民.一种改进的模糊聚类算法[J].计算机应用,2010,30(5):1277-1279. 被引量：7

1彭兴媛,刘琼荪,王立威.基于条件互信息下聚类的朴素贝叶斯分类算法[J].云南大学学报（自然科学版）,2011,33(5):517-520. 被引量：3
2刘海燕,王超,牛军钰.基于条件互信息的特征选择改进算法[J].计算机工程,2012,38(14):135-137. 被引量：9
3王建林,王志海,王学玲.基于不完全数据的TAN学习算法[J].计算机工程与应用,2007,43(36):181-184. 被引量：1
4申昇,杨宏晖,王芸,潘悦,唐建生.联合互信息水下目标特征选择算法[J].西北工业大学学报,2015,33(4):639-643. 被引量：2
5金聪,金枢炜.面向图像识别的条件互信息特征选择方法[J].测试技术学报,2010,24(5):459-462.
6王卫玲,刘培玉,初建崇.一种改进的基于条件互信息的特征选择算法[J].计算机应用,2007,27(2):433-435. 被引量：23
7黄冬梅,冯恺,赵丹枫,郭颖新.列名与数值不确定情况下的模式匹配问题研究[J].计算机科学,2014,41(8):85-89.
8李珍,江贵平.基于条件互信息量的随机蕨特征匹配算法[J].计算机工程与设计,2012,33(5):1908-1912. 被引量：4
9魏中强,徐宏喆,李文,桂小林.基于条件互信息和概率突跳机制的贝叶斯网络结构学习算法[J].计算机科学,2015,42(3):214-217. 被引量：3
10黄菊.基于直方图的微视频镜头边界检测方法的研究[J].信息通信,2016,29(12):70-72.

现代计算机（中旬刊）

2016年第5期

浏览历史

内容加载中请稍等...

基于多标签数据的降维与分类算法的研究

参考文献34

二级参考文献85

共引文献203

相关作者

相关机构

相关主题

浏览历史