期刊文献+

数据稀缺性与大模型数据价值的非对称性 被引量:3

Data Scarcity and Large Language Model Data Value Asymmetry
下载PDF
导出
摘要 随着大模型产业的快速发展,出于市场竞争的需要,模型规模快速膨胀,但同时可用于训练的数据供给相对不足、未来日趋稀缺,特别是高质量数据无法满足大模型计算规模指数级增长需求.在数据制度性约束日趋严密的今天,大模型的运行机理呈现自然垄断特征,而主要经济体之间数据治理思路的差异、国际段技术条件的差异以及算法歧视等因素都在持续加大供需双方的价值非对称性,影响大模型的数据价值分配,进而强化大模型所有者的数据垄断.我国发展大模型产业尽管面临国际段一系列技术条件限制,但是拥有数据禀赋优势,无论数量还是质量均具有很大潜力.为了更好积累数据价值收益,未来需要在自主平台、评估指标、国际规则等方面加强建设,并注重对大模型产业的政策引导. With the rapid development of the large language model(LLM)industry,due to market competition situations,LLM scale has expanded rapidly.However meanwhile on the supply side,available training datasets is relatively insufficient and increasing scarce,especially high-value ones cannot fulfill the exponential growth on LLM computation scale on the demand side.Status quo,under stringent institutional constraints on data factor,the operation mechanism of LLM has been proved with natural monopoly characteristics.Differences among economies in data governance philosophy and international section technical environment,and algorithm discrimination all increase value asymmetry between supply and demand,impact LLM data value distribution,and strengthen LLM owners'data monopoly.For China'LLM industry,although it confronts a series of technical constraints in the international section,however advantages of great potential in dataset endowment,both quantity and quality,could improve contributions for data value benefits accumulations.It is necessary to strengthen the construction of self-supporting LLM platforms,input and output value indicators,international rules,and also an emphasis on policy guidance for the future development of LLM industry.
作者 王翔 周辉 李志鹏 邢云 Wang Xiang;Zhou Hui;Li Zhipeng;Xing Yun(The National Information Center of China Customs,Beijing 100005;Laboratory of International Trade IT Standards,General Administration of Customs,Beijing 100005;Institute of Law,China Academy of Social Sciences,Beijing 100720;China CUSLINK Co.,Beijing 100023;China E-Port Data Center,Beijing 100088)
出处 《信息安全研究》 CSCD 2023年第7期637-642,共6页 Journal of Information Security Research
基金 海关总署科研项目(2019HK018,2020HK281,2020HK300,2022HK053)。
关键词 数据稀缺性 数据价值非对称性 数据垄断 智能生成(AIGC) 大模型(LLM) 跨境数据链 data scarcity data value asymmetry data monopoly artificial intelligence generated content(AIGC) large language model(LLM) cross-border data chain
  • 相关文献

参考文献1

共引文献2

同被引文献22

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部