y-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning

导出

摘要 With current success of large-scale pre-trained models(PTMs),how efficiently adapting PTMs to downstream tasks has attracted tremendous attention,especially for PTMs with billions of parameters.Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph.In this paper,we propose y-Tuning,an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks.y-Tuning learns dense representations for labels y defined in a given task and aligns them to fixed feature representation.Without computing the gradients of text encoder at training phrase,y-Tuning is not only parameterefficient but also training-efficient.Experimental results show that for DeBERTaxxL with 1.6 billion parameters,y-Tuning achieves performance more than 96%of full fine-tuning on GLUE Benchmark with only 2%tunable parameters and much fewer training costs.

作者 Yitao LIU Chenxin AN Xipeng QIU

机构地区 School of Computer Science

出处《Frontiers of Computer Science》 SCIE EI CSCD 2024年第4期107-116,共10页 中国计算机科学前沿（英文版）

基金 National Key R&D Program of China(No.2020AAA0108702) National Natural Science Foundation of China(Grant No.62022027).

关键词 pre-trained model lightweight fine-tuning paradigms label representation

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1QIU XiPeng,SUN TianXiang,XU YiGe,SHAO YunFan,DAI Ning,HUANG XuanJing.Pre-trained models for natural language processing: A survey[J].Science China(Technological Sciences),2020,63(10):1872-1897. 被引量：155

共引文献154

1王伟,阮文翰,孟祥福.融合对抗训练的中文GPT对话模型研究[J].辽宁工程技术大学学报（自然科学版）,2023(3):378-384.
2邱凯锋,王则远,何志超,付凯利,梅童霖,关英杰,高飞,伍俊妍.人工智能技术在超说明书用药循证中的应用研究[J].中华临床医师杂志（电子版）,2023,17(12):1212-1218.
3余同瑞,金冉,韩晓臻,李家辉,郁婷.自然语言处理预训练模型的研究综述[J].计算机工程与应用,2020,56(23):12-22. 被引量：49
4Yi HAN,Linbo QIAO,Jianming ZHENG,Hefeng WU,Dongsheng LI,Xiangke LIAO.A survey of script learning[J].Frontiers of Information Technology & Electronic Engineering,2021,22(3):341-373.
5郝超,裘杭萍,孙毅,张超然.多标签文本分类研究进展[J].计算机工程与应用,2021,57(10):48-56. 被引量：26
6邱石贵,章化奥,段湘煜,张民.神经机器翻译的词级别正则化[J].厦门大学学报（自然科学版）,2021,60(4):662-669.
7王涛,刘超辉,郑青青,黄嘉曦.基于单向Transformer和孪生网络的多轮任务型对话技术[J].计算机工程,2021,47(7):55-58.
8陈晓玲,唐丽玉,胡颖,江锋,彭巍,冯先超.基于ALBERT模型的园林植物知识实体与关系抽取方法[J].地球信息科学学报,2021,23(7):1208-1220. 被引量：6
9王永鹏,周晓磊,马慧敏,曹吉龙,无.联合知识的融合训练模型[J].计算机系统应用,2021,30(7):50-56. 被引量：1
10杨修远,彭韬,杨亮,林鸿飞.基于知识蒸馏的自适应多领域情感分析[J].山东大学学报（工学版）,2021,51(3):15-21. 被引量：1

1Lirong Yin,Lei Wang,Zhuohang Cai,Siyu Lu,Ruiyang Wang,Ahmed AlSanad,Salman A.AlQahtani,Xiaobing Chen,Zhengtong Yin,Xiaolu Li,Wenfeng Zheng.DPAL-BERT:A Faster and Lighter Question Answering Model[J].Computer Modeling in Engineering & Sciences,2024,141(10):771-786.

Frontiers of Computer Science

2024年第4期

浏览历史

内容加载中请稍等...

y-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning

参考文献1

共引文献154

相关作者

相关机构

相关主题

浏览历史