一种基于最大值损失函数的快速偏标记学习算法被引量：2

A Fast Partial Label Learning Algorithm Based on Max-loss Function

下载PDF

导出

摘要在弱监督信息条件下进行学习已成为大数据时代机器学习领域的研究热点,偏标记学习是最近提出的一种重要的弱监督学习框架,主要解决在只知道训练样本的真实标记属于某个候选标记集合的情况下如何进行学习的问题,在很多领域都具有广泛应用.最大值损失函数可以很好地描述偏标记学习中的样本与候选标记间的关系,但是由于建立的模型通常是一个难以求解的非光滑函数,目前还没有建立基于该损失函数的偏标记学习算法.此外,已有的偏标记学习算法都只能处理样本规模比较小的问题,还没看到面向大数据的算法.针对以上2个问题,先利用凝聚函数逼近最大值损失函数中的max(·)将模型的目标函数转换为一个光滑的凹函数,然后利用随机拟牛顿法对其进行求解,最终实现了一种基于最大值损失函数的快速偏标记学习算法.仿真实验结果表明,此算法不仅要比基于均值损失函数的传统算法取得更好的分类精度,运行速度上也远远快于这些算法,处理样本规模达到百万级的问题只需要几分钟. In the age of big data ,learning with weak supervision has become one of the hot research topics in machine learning field . Partial label learning , which deals with the problem where each training example is associated with a set of candidate labels among which only one label corresponds to the ground-truth ,is an important weakly-supervised machine learning frameworks proposed recently and can be widely used in many real world tasks .The max-loss function may be used to accurately capture the relationship between the partial labeled sample and its labels .However ,since the max-loss function usually brings us a nondifferentiable objective function difficult to be solved ,it is rarely adopted in the existing algorithms .Moreover ,the existing partial label learning algorithms can only deal with the problem with small-scale data ,and rarely can be used to deal with big data .To cure above two problems , this paper presents a fast partial label learning algorithm with the max-loss function .The basic idea is to transform the nondifferentiable objective to a differentiable concave function by introducing the aggregate function to approximate the max （？） function involved in the max-lass function ,and then to solve the obtained concave objective function by using a stochastic quasi-New ton method . The experimental results show that the proposed algorithm can not only achieve higher accuracy but also use shorter computing time than the state-of-the-art algorithms with average-loss functions .Moreover ,the proposed algorithm can deal with the problems with millions samples within several minutes .

作者周瑜贺建军顾宏张俊星

机构地区大连理工大学电子信息与电气工程学部大连民族大学信息与通信工程学院

出处《计算机研究与发展》 EI CSCD 北大核心 2016年第5期1053-1062,共10页 Journal of Computer Research and Development

基金国家自然科学基金项目(61503058 61374170 61502074 U1560102) 高等学校博士学科点专项科研基金项目(20120041110008) 中央高校基本科研业务费专项资金项目(DC201501055 DC201501060201)~~

关键词偏标记学习最大值损失函数凝聚函数弱监督学习分类精度 partial label learning max-loss function aggregate function weakly-supervised learning classification accuracy

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1张敏灵.偏标记学习研究综述[J].数据采集与处理,2015,30(1):77-87. 被引量：13
2李兴斯.AN AGGREGATE FUNCTION METHOD FOR NONLINEAR PROGRAMMING[J].Science China Mathematics,1991,34(12):1467-1473. 被引量：30

二级参考文献34

1Mitchell T M. Machine learning[,M]. New York: McGraw-Hill, 1997.
2Pfahringer B. Learning with weak supervision: Charting the territory[C]//Keynote Talk at the 1st International Workshop on I.earning with Weak Supervision ( LAWS' 12, in conjunction with ACML' 12). Singapore : [ s. n. ], 2012.
3Cour T, Sapp B, Taskar B. Learning from partial labels[J]. Journal of Machine Learning Research, 2011, 12: 1501-1536.
4Cour T, Sapp B, Jordan C, et al. Learning from ambiguous labeled images[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami, FL:[s. n. ], 2009: 919-926.
5Zeng Z, Xiao S, Jia K, et al. Learning by associating ambiguously labeled images[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Portland, OR:[s. n. ], 2013: 708-715.
6Jie L, Orabona F. Learning from candidate labeling sets[C]//Advances in Neural Information Processing Systems 23. Cam- bridge, MA: MIT Press, 2010: 1504-1512.
7Liu L, Dietterich T. A conditional multinomial mixture model for superset label learning[C]//Advances in Neural Informa- tion Processing Systems 25. Cambridge, MA: MIT Press, 2012: 557-565.
8Grandvalet Y. Logistic regression for partial labels[C]//Proceedings of the 9th International Conference on Information Pro- cessing and Management of Uncertainty in Knowledge-Based Systems. Annecy, France:[s. n. ], 2002: 1935-1941.
9Jin R, Ghahramani Z. Learning with multiple labels[C]//Advances in Neural Information Processing Systems 15. Cam- bridge, MA: MIT Press, 2003: 897-904.
10Chapelle O, Sch61kopf B, Zien A. Semi-supervised learning[M]. Cambridge, MA: MIT Press, 2006.

共引文献41

1于波,刘国新,冯果忱,李勇.The Aggregate Homotopy Method for Constrained Sequential Max-min Problems[J].Northeastern Mathematical Journal,2003,19(4):287-290. 被引量：1
2李纯莲,王希诚,赵金城,武金瑛.一种基于信息熵的多种群遗传算法[J].大连理工大学学报,2004,44(4):589-593. 被引量：21
3石连栓,王跃方,孙焕纯.具有动应力、位移和稳定性约束的离散变量桁架结构布局优化设计算法[J].应用数学和力学,2006,27(5):527-532. 被引量：4
4苏孟龙,赵立芹,吕显瑞.组合极大熵同伦方法求解一类非凸非线性规划问题的K-K-T点[J].吉林大学学报（理学版）,2006,44(5):710-714. 被引量：2
5苏孟龙,吕显瑞.Solving a Class of Brouwer Fixed-point Problems via a Modified Aggregate Constraint Homotopy Method[J].Northeastern Mathematical Journal,2007,23(5):377-385.
6崔振东,王希诚,孔喆,余德启.基于网格的演化算法及其应用[J].计算机集成制造系统,2008,14(4):806-812. 被引量：1
7苏孟龙,赵立芹,吕显瑞.改进的凝聚约束同伦方法求解一类非线性最优化问题[J].吉林大学学报（理学版）,2008,46(6):1094-1096. 被引量：5
8Hong-xia Yin.Error Bounds of Two Smoothing Approximations for Semi-infinite Minimax Problems[J].Acta Mathematicae Applicatae Sinica,2009,25(4):685-696.
9Yu XIAO,Bo YU,De Lun WANG.Truncated Smoothing Newton Method for l_∞ Fitting Rotated Cones[J].Journal of Mathematical Research and Exposition,2010,30(1):159-166.
10叶峰,刘红卫,周水生,刘三阳.求解SEB问题的有限记忆BFGS方法[J].西北大学学报（自然科学版）,2010,40(2):210-214.

同被引文献1

1张敏灵.偏标记学习研究综述[J].数据采集与处理,2015,30(1):77-87. 被引量：13

引证文献2

1周瑜,顾宏.面向不平衡数据的逻辑回归偏标记学习算法[J].大连理工大学学报,2017,57(2):184-188. 被引量：4
2张仕将,柴晶.一种基于最大间隔的偏标记学习算法[J].科学技术与工程,2018,18(28):109-115. 被引量：1

二级引证文献5

1滕予非,冯世林,张真源,何锐,吴杰,高剑,李熠,张宏图.基于数据驱动的500 kV高压并联电抗器过流误报警在线判别方法研究[J].电力系统保护与控制,2019,47(3):146-153. 被引量：10
2金亚洲,张正军,颜子寒,王雅萍.基于间隔准则的优化排序多标记学习算法[J].计算机工程,2020,46(7):104-109.
3吴蕊,孔前进,王世勋,孙东山,翟怡星.双模态Logistic Regression及其应用[J].计算机应用与软件,2020,37(12):244-248. 被引量：1
4柳翠,杨巍.基于逻辑回归的代发工资数据差异核对的数学建模[J].廊坊师范学院学报（自然科学版）,2020,20(4):73-78. 被引量：1
5王丽,于明仟,刘文鹏,周瑜,郑蕊蕊,贺建军.面向类不平衡数据的K近邻偏标记学习算法[J].山东大学学报（工学版）,2022,52(3):18-24.

1贺建军,王欣,顾宏,王哲龙.基于Logistic回归模型和凝聚函数的多示例学习算法[J].大连理工大学学报,2010,50(5):788-793. 被引量：1
2严考碧,李志欣,张灿龙.基于主题模型的多示例多标记学习方法[J].计算机应用,2015,35(8):2233-2237. 被引量：1
3雍龙泉.基于极大熵微粒群混合算法的非线性方程组求解[J].海军工程大学学报,2009,21(3):28-32. 被引量：4
4王宇,杨莉.基于凝聚函数的混合属性数据聚类算法[J].大连理工大学学报,2006,46(3):446-448. 被引量：2
5王卫红,曹玉辉,覃征.面向移动嵌入式终端的柔性人机界面模型[J].微电子学与计算机,2005,22(12):41-43. 被引量：1
6张雪亚,夏圈世,俞金寿.大系统优化控制中的非光滑函数优化问题[J].华东化工学院学报,1991,17(4):494-501.
7马新强,黄羿.基于格的可信计算模型[J].通信学报,2010,31(S1):105-110. 被引量：5
8刘洋.中文文本分类中特征选择方法的比较研究[J].科技信息,2007(3):54-54. 被引量：3
9马杰,葛岭岭,苑焕朝,张婷婷.基于L_(1/2)正则项的磁共振图像稀疏重构[J].河北工业大学学报,2015,44(4):1-7.
10黄重庆,徐哲壮,黄宴委,赖大虎.基于近似结构风险的ELM隐层节点数优化[J].计算机工程,2014,40(9):215-219. 被引量：2

计算机研究与发展

2016年第5期

浏览历史

内容加载中请稍等...

一种基于最大值损失函数的快速偏标记学习算法被引量：2

参考文献2

二级参考文献34

共引文献41

同被引文献1

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

一种基于最大值损失函数的快速偏标记学习算法 被引量：2

参考文献2

二级参考文献34

共引文献41

同被引文献1

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

一种基于最大值损失函数的快速偏标记学习算法被引量：2