期刊文献+

预训练语言模型BERT在下游任务中的应用 被引量:6

Application of pre-trained language model BERT in downstream tasks
下载PDF
导出
摘要 BERT模型是一种全新的语言模型,其采用微调的双向Transformer编码实现,利用fine-tuning的方式进行预训练,实际使用时,只需要根据具体任务额外增加一个输出层就可以用来解决特定任务,克服了传统词嵌入模型对不同任务定义不同网络结构的缺点。为了更好地理解BERT模型及其效果,首先概述BERT模型的原理,以及BERT的预训练策略,然后介绍了如何将BERT模型应用于3个下游任务:文本分类、机器阅读理解和文本摘要,并通过对比实验展示了BERT模型的优势。最后,对未来研究方向进行了展望。 The BERT model is a new language model,which is implemented by fine-tuned bidirectional Transformer coding.Using Fine-Turning method to get the pre-trained model,it only needs to add an additional output layer according to specific tasks,then it can be used to solve the tasks,thus overcoming the shortcomings of the traditional word embedding model which needs to define different tasks for different network structure.In order to understand the BERT model and its effect better,the principle of the model is reviewed,the pretrain strategy of the model is introduced in this article.Then how to apply the BERT model to the three downstream tasks:text classification,machine reading comprehension,and text summary are introduced,and the advantages of the BERT model through comparative experiments are demonstrated.Finally,the future research direction is prospected.
作者 段瑞雪 巢文宇 张仰森 DUAN Ruixue;CHAO Wenyu;ZHANG Yangsen(Computer School,Beijing Information Science&Technology,Beijing 100192,China;School of Information Management,Beijing Information Science&Technology,Beijing 100192,China;Beijing Laboratory of National Economic Security Early Warming Project,Beijing 100044,China)
出处 《北京信息科技大学学报(自然科学版)》 2020年第6期77-83,共7页 Journal of Beijing Information Science and Technology University
基金 北京市自然科学青年基金项目(4204100) 北京信息科技大学校基金(1825023) 北京信息科技大学2020年促进高校内涵发展-大学生科研训练项目(5102010805) 北京信息科技大学2019年度‘实培计划'项目资助。
关键词 预训练 机器阅读理解 文本分类 文本摘要 pre-train machine reading comprehension text classification text summarization
  • 相关文献

参考文献4

二级参考文献42

  • 1Day N E. Estimating the components of a mixture of normal distributions[J]. Biometrika, 1969, 56(3):463-474.
  • 2Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society: Series B, 1977, 39(1): 1-38.
  • 3Miller D J, Uyar H. A generalized Gaussian mixture classifier with learning based on both labelled and unlabelled data[C]//Advances in Neural Information Processing Systems 9: Proceedings of the 1996 Conference. Cambridge, MA, USA: MIT Press, 1996: 783-787.
  • 4Nigam K, McCallum A, Thrun S, et al. Learning to classify text from labeled and unlabeled documents[C]// Proceedings of the 15th National/10th Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence. Menlo Park, CA, USA: AAAI Press, 1998: 792-799.
  • 5Baluja S. Probabilistic modeling for face orientation discrimination: learning from labeled and unlabeled examples[C]//Advances in Neural Information Processing Systems 11: Proceedings of the 1998 Conference. Cambridge, MA, USA: MIT Press, 1998: 854-860.
  • 6Joachims T. Transductive inference for text classificationusing support vector machines[C]//Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann, 1999: 200-209.
  • 7Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts[C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann, 2001: 19-26.
  • 8Szummer M, Jaakkola T. Partially labeled classification with Markov random walks[C]//Advances in Neural Information Processing Systems 14: Proceedings of the 2001 Conference. Cambridge, MA, USA: MIT Press, 2001: 945-952.
  • 9Chapelle O, Weston J, Schoelkopf B. Cluster kernels for semi-supervised learning[C]//Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference. Cambridge, MA, USA: MIT Press, 2002: 585-592.
  • 10Zhou D, Bousquet O, Lal T, et al. Learning with local and global consistency[C]//Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference. Cambridge, MA, USA: MIT Press, 2003: 321-328.

共引文献27

同被引文献78

引证文献6

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部