Hadoop框架下的多标签传播算法被引量：1

A Label Propagation Algorithm for Multi-Label Classification Using Hadoop Technology

下载PDF

导出

摘要标签传播算法的主要思想是利用已标注数据的标签信息预测未标注数据的标签信息。然而,传统传播算法没有区别对待未标注数据与已标注数据相互之间的转移信息,导致算法的收敛速度较慢,影响了算法的性能。针对传统算法的不足,提出了差异权重标签传播算法,算法按标注信息的重要性赋予不同的权重。在解决了大规模特征矩阵相乘问题之后,将提出的差异权重标签传播算法应用到Hadoop框架下,采用分布式计算,实现了能够处理大规模数据的多标签分类算法(HSML),并将提出的HSML算法与现有主流多标签分类算法进行了性能比较。实验结果表明,HSML算法在多标签分类的各项性能评测指标和执行速度上都是有效的。 A method of label propagation using Hadoop technology,named HSML,is proposed,to cope with the challenge of exponential-sized output space learning from multi-label data.Label propagation algorithms are graph-based semi-supervised learning methods,and use the label information of labeled data to predict the label information of unlabeled data.Traditional label propagation algorithms do not consider the posterior probability and distinguish information between labeled data and unlabeled data during the label propagation process,hence,the performance of traditional label propagation algorithms is affected. Therefore, a label propagation algorithm with different weights is proposed.After the multiplication problem of large-scale feature matrices is solved,the proposed algorithm is applied to the framework of Hadoop to deal with the problem of multi-label classification learning from big data.Experimental results and comparisons with some well-established multi-label learning algorithms,show that the performance of HSML is superior,and that the bigger test set is the faster HSML runs.

作者孙霞张敏超冯筠张蕾何绯娟

机构地区西北大学信息科学与技术学院西安交通大学城市学院

出处《西安交通大学学报》 EI CAS CSCD 北大核心 2015年第5期134-139,共6页 Journal of Xi'an Jiaotong University

基金国家自然科学基金资助项目(61202184 61100166) 陕西省教育厅资助项目(2013JK1152)

关键词 HADOOP 多标签分类标签传播算法 Hadoop multi-label classification label propagation algorithm

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1ZHANG Minling,ZHOU Zhihua.A review on multilabel learning algorithms[J].IEEE Transactions on Knowledge&Data Engineering,2014,26(8):1-59.
2XU Miao,LI Yufeng,ZHOU Zhihua.Multi-label learning with pro loss[C]∥Proceedings of the 27th AAAI Conference on Artificial Intelligence.Palo Alto,California,USA:AAAI,2013:998-1004.
3SUN Y Y,ZHANG Y,ZHOU Z H.Multi-label learning with weak label[C]∥24th AAAI Conference on Artificial Intelligence.Palo Alto,California,USA:AAAI,2010:593-598.
4孔祥南,黎铭,姜远,周志华.一种针对弱标记的直推式多标记分类方法[J].计算机研究与发展,2010,47(8):1392-1399. 被引量：13
5BOUTELL M R,LUO J,SHEN X,et al.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757-1771.
6TSOUMAKAS G,VLAHAVAS I.Random k-labelsets:an ensemble method for multilabel classification[C]∥18th European Conference on Machine Learning.Berlin,Germany:Springer,2007:406-417.
7ZHANG Minling,ZHOU Zhihua.ML-kNN:a lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
8ZHANG Minling,ZHOU Zhihua.Multilabel neural networks with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
9ELISSEEFF A,WESTON J.A kernel method for multi-labelled classification[C]∥Advances in Neural Information Processing Systems.Cambridge,MA,USA:MIT,2002:681-687.
10ZHU X J,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation,CMUCALD-02-107[R].Pittsburghers,USA:Carnegie Mellon University,2002.

二级参考文献16

1Schapire R E,Singer Y.Boostexter:A boosting-based system for text categorization[J].Machine Learning,2000,39(2/3):135-168.
2Elisseeff A,Weston J.A kernel method for multi-labelled classification[C] //Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2002:681-687.
3Zhang M -L,Zhou Z -H.Ml-kNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
4Zhang M -L,Zhou Z -H.Multi-label neural networks with applications to functional genomics and text categorization[J].IEEE Trans on Knowledge and Data Engineering,2006,18(10):1338-1351.
5周志华,张敏灵,黄圣君,等.MIML:一种从歧义对象中学习的框架,0808.3231[R].南京:南京大学软件新技术国家重点实验室,2008.
6Comite F D,Gilleron R,Tommasi M.Learning multi-label alternating decision tree from texts and data[C] //Proc of the 3rd Int Conf on Machine Learning and Data Mining in Pattern Recognition.Berlin:Springer,2003:35-49.
7Gao S,Wu W,Lee C -H,et al.A MFoM learning approach to robust multiclass multi-label text categorization[C] //Proc of the 21st Int Conf on Machine Learning.New York:ACM,2004:329-336.
8Kazawa H,Izumitani T,Taira H,et al.Maximal margin labeling for multi-topic text categorization[C] //Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2005:649-656.
9McCallum A.Multi-label text classification with a mixture model trained by EM[C] //Working Notes of the AAAI'99 Workshop on Text Learning.Menlo Park,CA:AAAI,1999:1-7.
10Boutell M R,Luo J,Shen X,Brown C M.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757-1771.

共引文献18

1刘培奇,孙捷焓.基于LDA主题模型的标签传递算法[J].计算机应用,2012,32(2):403-406. 被引量：5
2李宇峰,黄圣君,周志华.一种基于正则化的半监督多标记学习方法[J].计算机研究与发展,2012,49(6):1272-1278. 被引量：18
3李凤英,李宏,李培.针对弱标记的多标记数据集成学习分类方法[J].微型机与应用,2012,31(13):73-75.
4潘俊,孔繁胜,王瑞琴.局部敏感判别直推学习机[J].浙江大学学报（工学版）,2012,46(6):987-994.
5田枫,沈旭昆,刘贤梅,周凯,杜睿山.一种基于弱标签的三维模型语义自动标注方法[J].系统仿真学报,2012,24(9):1873-1876. 被引量：3
6张振海,李士宁,李志刚,陈昊.一类基于信息熵的多标签特征选择算法[J].计算机研究与发展,2013,50(6):1177-1184. 被引量：62
7高丽,周津慧,刘雅静.3O会聚网站数据特性分析[J].现代图书情报技术,2013(7):1-12.
8刘晓娟,尤斌,张爱芸.基于微博数据的应用研究综述[J].情报杂志,2013,32(9):39-45. 被引量：18
9刘杨磊,梁吉业,高嘉伟,杨静.基于Tri-training的半监督多标记学习算法[J].智能系统学报,2013,8(5):439-445. 被引量：4
10姜赢,曾杰,林启红,郭颖珊,廖文生.LanguageTool中文语法校对XML规则定制方法[J].图书情报工作,2014,58(5):86-92. 被引量：5

同被引文献5

1马宗杰,刘华文.基于奇异值分解—偏最小二乘回归的多标签分类算法[J].计算机应用,2014,34(7):2058-2060. 被引量：5
2王霄,周李威,陈耿,朱玉全.一种基于标签相关性的多标签分类算法[J].计算机应用研究,2014,31(9):2609-2612. 被引量：9
3李远航,刘波,唐侨.面向多标签图数据的主动学习[J].计算机科学,2014,41(11):260-264. 被引量：1
4徐美香,孙福明,李豪杰.主动学习的多标签图像在线分类[J].中国图象图形学报,2015,20(2):237-244. 被引量：5
5徐晓丹,姚明海,刘华文,郑忠龙.基于kNN的多标签分类预处理方法[J].计算机科学,2015,42(5):106-108. 被引量：5

引证文献1

1张丽娜,戴灵鹏,匡泰.一种适应于非完备标签数据和标签关联性的多标签分类方法[J].电信科学,2016,32(8):82-89.

1谭伟,钱东海,谢明江.智能嗅敏信息处理技术中神经网络方法的应用研究[J].自动化仪表,1999,20(12):8-10. 被引量：1
2余本功,马溪骏,杨善林.信息系统中历史数据的转移和恢复[J].计算机应用研究,2006,23(4):183-184. 被引量：1
3刘宇鹏,乔秀明,赵石磊,马春光.统计机器翻译中大规模特征的深度融合[J].浙江大学学报（工学版）,2017,51(1):46-56. 被引量：4
4李君,Zhang Shunyi,Wang Pan,Li Cuilian.Research on internet traffic classification techniques using supervised machine learning[J].High Technology Letters,2009,15(4):369-377. 被引量：1
5杜晓童,赖怡洁,宋宇婷.软件产品线优化决策模型及其自动生成工具[J].科学中国人,2014(09S):101-102.
6秦锋,黄俊,程泽凯.用于多标记学习的阈值确定算法[J].计算机工程,2010,36(21):214-216. 被引量：1
7张建忠,李宏伟,邓冬虎.一种低复杂度的空时分组码检测算法[J].电视技术,2011,35(2):67-70. 被引量：4
8WANG Wei,ZHOU ZhiHua.Learnability of multi-instance multi-label learning[J].Chinese Science Bulletin,2012,57(19):2488-2491. 被引量：2
9阮莹莹,汪西莉,蔺洪帅.基于均值漂移的p电压图像分类算法[J].计算机工程,2016,42(6):280-286.
10邓志玲,左保河.利用程序地址空间转移信息防治计算机病毒[J].电脑知识与技术,2006,1(4):44-45. 被引量：1

西安交通大学学报

2015年第5期

浏览历史

内容加载中请稍等...

Hadoop框架下的多标签传播算法被引量：1

参考文献14

二级参考文献16

共引文献18

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

Hadoop框架下的多标签传播算法 被引量：1

参考文献14

二级参考文献16

共引文献18

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

Hadoop框架下的多标签传播算法被引量：1