摘要
随着人工智能技术的飞速发展,大语言模型已在众多领域得到了广泛应用。然而,大语言模型可能会生成不准确、有误导性甚至有害的内容,这引发了人们对大语言模型可靠性的担忧,采用对齐技术来确保大语言模型的行为与人类价值观一致已经成为一个亟待解决的问题。对近年来大语言模型对齐技术的研究进展进行综述。介绍了常用的指令数据收集方法和人类偏好数据集,概述了监督调整和对齐调整的相关研究,讨论了模型评估常用的数据集和方法,总结并展望了未来的研究方向。
With the rapid development of artificial intelligence technology,large language models have been widely applied in numerous fields.However,the potential of large language models to generate inaccurate,misleading,or even harmful contents has raised concerns about their reliability.Adopting alignment techniques to ensure the behavior of large language models is consistent with human values has become an urgent issue to address.Recent research progress on alignment techniques for large language models were surveyed.Common methods for collecting instruction data and human preference datasets were introduced,research on supervised tuning and alignment adjustments was summarized,commonly used datasets and methods for model evaluation were discussed,and future research directions were concluded.
作者
刘昆麟
屈新纪
谭芳
康红辉
赵少伟
施嵘
LIU Kunlin;QU Xinji;TAN Fang;KANG Honghui;ZHAO Shaowei;SHI Rong(ZTE Corporation,Shenzhen 518057,China)
出处
《电信科学》
北大核心
2024年第6期173-194,共22页
Telecommunications Science
关键词
大语言模型
对齐技术
调整
强化学习
large language model
alignment technique
tune
reinforcement learning