基于思维链的大语言模型知识蒸馏

Knowledge Distillation of Large Language Models Based on Chain of Thought

下载PDF

导出

摘要思维链(Chain of thought,CoT)提示使大语言模型能够按照具体推理步骤处理复杂的任务,让大语言模型在常识推理、数学逻辑推理和可解释性等方面表现出更强的能力。然而,CoT方法的主要缺点在于其对庞大语言模型的依赖,这些模型通常拥有数百亿的参数,在大规模部署方面面临挑战。为此,本文提出一种基于思维链的大模型知识蒸馏方法,主要目标在于充分利用大型语言模型的思维推理能力,通过知识蒸馏技术,引导小模型解决复杂任务。以大型模型为教师模型,小型模型为学生模型,通过获取教师模型的推理数据来微调学生模型。通过更改数据生成方式、基于聚类的问答示例采样、示例启发式纠错以及答案的自适应生成等一系列精心设计的方法,使教师模型的生成过程更高效,生成的推理数据质量更高、数量更多,从而更好地微调学生模型,使其获得强大的推理能力,实现高效的知识蒸馏。这一研究框架旨在建立一个有效的知识传递机制,使得大模型的深度思考能够有效指导小模型,为解决复杂任务提供更为智能且高效的解决方案。通过这种方式,希望能够克服大模型部署的挑战,并促进语言模型在现实世界中的应用和进步。 The chain of thought(CoT)prompts enable large language models to process complex tasks according to specific reasoning steps,allowing them to demonstrate stronger capabilities in common sense reasoning,mathematical logic reasoning,and interpretability.However,the main drawback of the CoT approach lies in its reliance on massive language models,which typically have billions of parameters and face challenges in large-scale deployment.To address this issue,this paper proposes a large model knowledge distillation method based on the CoT,aiming to fully leverage the thinking and reasoning capabilities of large language models.Through knowledge distillation techniques,the main goal is to guide smaller models in solving complex tasks.This study adopts a large model as the teacher model and a small model as the student model,fine-tuning the student model by acquiring reasoning data from the teacher model.Through a series of carefully designed methods,such as changing data generation methods,clustering-based sampling of question-answer examples,heuristic correction of examples,and adaptive generation of answers,this study makes the generation process of the teacher model more efficient,resulting in higher-quality and larger quantities of reasoning data.This enables better fine-tuning of the student model,allowing it to acquire strong reasoning capabilities and achieve efficient knowledge distillation.The framework of this study aims to establish an effective knowledge transfer mechanism,allowing the deep thinking of large models to effectively guide smaller models,providing more intelligent and efficient solutions for solving complex tasks.Through this approach,we hope to overcome the challenges of deploying large models and promote the application and advancement of language models in the real world.

作者李荣涵浦荣成沈佳楠李栋栋苗启广 LI Ronghan;PU Rongcheng;SHEN Jianan;LI Dongdong;MIAO Qiguang(School of Computer Science and Technology,Xidian University,Xi’an 710000,China;Key Laboratory of Counter-Terrorism Command&Information Engineering of Ministry of Education(Approval),Engineering University of PAP,Xi’an 710086,China)

机构地区西安电子科技大学计算机科学与技术学院武警工程大学反恐指挥信息工程教育部重点实验室(立项)

出处《数据采集与处理》 CSCD 北大核心 2024年第3期547-558,共12页 Journal of Data Acquisition and Processing

关键词思维链逻辑推理知识蒸馏微调 chain of thought logical reasoning knowledge distillation fine-tuning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1夏润泽,李丕绩.ChatGPT大模型技术发展与应用[J].数据采集与处理,2023,38(5):1017-1034. 被引量：14

共引文献13

1陶炜,沈阳.从ChatGPT到Sora:面向AIGC的四能教育和范式革新[J].现代教育技术,2024,34(4):16-27. 被引量：6
2林晖,郭庆浪,王迎雪,黄虎.大模型在社会治理应用中的偏见性检测方法[J].中国电子科学研究院学报,2024,19(1):69-75.
3吴晓宁,李瑞欣,王浪,刘文杰,王宏伟,朱新立,宋江帆,袁梦.基于大模型的联动处置多智能代理协同框架[J].数据采集与处理,2024,39(3):559-576.
4谢思静,文鼎柱.基于联邦分割学习与低秩适应的RoBERTa预训练模型微调方法[J].数据采集与处理,2024,39(3):577-587.
5崔翛龙,高志强,姬纬通,沈佳楠,张敏,邱鑫源.“艾武大模型+”:一种军事大模型系统的开发与实证[J].数据采集与处理,2024,39(3):588-597.
6孙亚洲,李晓松,吕彬,杨慧.类ChatGPT赋能指挥控制的情报提升和情报局限--基于信息流动模型[J].情报杂志,2024,43(6):117-125.
7李峰,乔春庚.一种基于大模型技术的算力产业监测服务平台设计[J].信息通信技术与政策,2024,50(6):45-53.
8魏晓,陈茂清,曹小琴,许芳婷.预训练大语言模型发展对中国数字创意产业的启示[J].科技管理研究,2024,44(12):123-129.
9张宏展,赵辉,于鹏.AI在大数据技术中的创新与应用[J].科技创新与应用,2024,14(21):16-19.
10王越芸,丁玫,温建萍,黄华.以大模型智能体为核心的智能审计范式构建[J].审计月刊,2024(5):8-11.

1唐霞霞.现代农业技术创新与农民可持续经营能力的关系研究[J].中文科技期刊数据库（全文版）农业科学,2023(11):130-133.
2周扬,蔡霈涵,董振江.大模型知识管理系统[J].中兴通讯技术,2024,30(2):63-71.
3晏炳刚,涂元梅.假设大小、确定放缩、精度调整--例谈一类比较大小题目的一种解答策略[J].中学数学研究（华南师范大学）（下半月）,2024(1):47-48.
4何斯琪,穆琛,陈迟晓.基于存算一体集成芯片的大语言模型专用硬件架构[J].中兴通讯技术,2024,30(2):37-42. 被引量：1
5王礼乐,刘渊.基于空间注意力图的知识蒸馏算法[J].计算机应用研究,2024,41(6):1693-1698.
6吴艳艳.游戏化教学在小学数学综合中的应用研究[J].中文科技期刊数据库（文摘版）教育,2024(4):0164-0167.
7高新波,孙宏滨.“大模型技术及应用”专栏序言[J].数据采集与处理,2024,39(3):501-501.
8李开复.AI2.0:平台变革进行正当时[J].经理人,2024(5):14-15.
9物联网产业视点(2024年5月)[J].物联网技术,2024,14(6):1-1.
10黄丽蓉,张勇,文娟.小初衔接数学核心素养的培养策略[J].小学教学参考,2024(14):73-75. 被引量：1

数据采集与处理

2024年第3期

浏览历史

内容加载中请稍等...

基于思维链的大语言模型知识蒸馏

参考文献1

共引文献13

相关作者

相关机构

相关主题

浏览历史