摘要
为提高人工智能大模型全生命周期的价值和性能,推动大模型在各行各业落地应用,需要把以数据为中心的人工智能理念和技术贯穿于大模型全生命周期。在分析大模型数据治理的内涵特征、必要性、特殊性及重点内容等基础上,针对大模型的规划设计、预训练、评估、部署推理、运维监控、退役(迭代)等全生命周期关键阶段,分阶段确定数据治理的框架、对象、重点任务和技术策略,以期为大模型的数据治理提供全景式的逻辑框架和全流程的技术参考。
To enhance the overall value and performance of AI(Artificial Intelligence)large language models throughout their lifecycle and facilitate their adoption across various sectors,it is essential to adopt the data-centric AI,integrating its principles and technologies throughout the entire lifecycle of these models.On the basis of analyzation of the connotative characteristics,significance,unique aspects,and key elements of data governance for large language models,a staged approach to data governance,addressing critical lifecycle phases such as planning and design,pre-training,evaluation,deployment and inference,operations and monitoring,and retirement or iteration is outlined.This approach aims to provide a comprehensive theoretical framework and practical technical guidance for effective data governance for large language models.
作者
刘鑫
毕超
LIU Xin;BI Chao(School of Environment and Energy,Peking University Shenzhen Graduate School,Shenzhen Guangdong 518055,China;Agricultural Development Bank of China,Beijing 100045,China)
出处
《信息安全与通信保密》
2024年第6期45-55,共11页
Information Security and Communications Privacy
基金
国家重点研发计划项目(2019YFB1404601)。
关键词
人工智能大模型
全生命周期
数据治理
任务
策略
artificial intelligence large language model
entire lifecycle
data governance
task
strategy