摘要
近年来,随着深度学习技术在自然语言处理任务中的广泛应用,语言模型规模越来越大。然而,大规模模型推理速度慢、资源消耗成本高,难以在工业上进行应用,小规模模型又难以达到大规模模型效果。因此提出一种基于教师—学生框架的知识蒸馏模型,以预训练模型BERT作为教师模型,以长短时记忆网络(BiLSTM)等小模型作为学生模型,通过教师指导学生学习的方式将教师模型学习到的知识迁移至学生模型中。实验结果表明,蒸馏模型将运算推理时间缩短至教师模型的1/725,将学生模型短文本分类准确率提升3.16%。
In recent years,with the extensive application of deep learning technology in natural language processing tasks,language models have developed in an increasingly largescale.However,large-scale models have slow reasoning and high resource consumption costs,and are difficult to apply in industry.It is difficult to directly train small-scale models to achieve the effect of large-scale models.To address these issues,this paper proposes a knowledge distillation model based on the teacher-student framework,using the pre-training model BERT as the teacher model,and the long-short-term memory network is used as a student model,and the knowledge learned by the teacher model is transferred to the student model by the teacher guiding the student to learn.The experimental results show that the distillation model shortens the calculation and inference time to 1/725 times of the original teacher model,and improves the accuracy of the short text classification of the student model by 3.16%.
作者
孙红
黄瓯严
SUN Hong;HUANG Ou-yan(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《软件导刊》
2021年第6期23-27,共5页
Software Guide
基金
国家自然科学基金项目(61472256,61170277,61703277)
沪江基金项目(C14002)。
关键词
知识蒸馏
文本分类
双向模型
自然语言处理
knowledge distillation
text classification
bidirectional model
natural language processing