摘要
近半年来,柴语生(ChatGPT)等大规模生成式语言模型的应用,引发了全社会的关注和反思。对这种大模型,应以工具观加以正视,认可其技术发展带来的益处,同时尽量规避其风险。对它们的治理,应减少对技术本身的干预,将目标定位于大模型赖以研发的语言资源和投放之后的使用。对大模型研发中的语言资源治理,应着力打破中文数据孤岛:发展以联邦学习为代表的分布式模型构建技术,建立国家知识数据开放机制,尽快健全开放、高效的语言数据交换市场;提倡世界知识中文表达,助推中文大模型研发:尽快实现中文精华知识资源面向网络开放,完善中文概念、术语资源,做大、做全领域中文资源。对大模型使用领域的治理,则因大模型本身也是一种重要的语言资源,故应强调其基础资源地位,从标准化、评测和伦理规制的角度进行。
Over the past six months,the application of large language models such as ChatGPT has drawn international attention and sparked critical reflection in the whole world.In this paper,it is argued that these large language models should be viewed as instrumental tools that bring about benefits with their technological development as well as risks in the application.Consequently,their governance should be focused less on technological intervention,and more on language resources vital for their development and application.Regarding the governance of language resources in large language model development,eff orts should be made to break down the data silos of Chinese language resources,develop distributed model construction technologies through federated learning,establish open-accessed national knowledge data mechanisms,and expand the open and efficient language data exchange markets.These eff orts are aimed to promote Chinese expression of world knowledge and facilitate the development of Chinese large language models.Since the large language models are an important language resource in nature,their fundamental resource status should be emphasized in the application,and perspectives of standardization,evaluation,and ethical regulation should be taken in their governance.
出处
《语言战略研究》
北大核心
2023年第4期19-29,共11页
Chinese Journal of Language Policy and Planning
基金
教育部人文社科青年项目“清末以来汉语报刊词汇使用计量研究”(20YJC740050)
北京语言大学梧桐创新平台(21PT04)。
关键词
柴语生
语言资源
大规模语言模型
语言治理
ChatGPT
language resources
large language model
language governance