摘要
在生成式人工智能时代,理解机器语言知识的本质和来源问题已成为哲学、语言科学与智能技术领域的一个重要议题。本文旨在通过对比人类儿童与机器的语言获得过程与机制,探讨当前先进的大模型在语言学习核心维度上与人类儿童的根本差异。儿童语言的获得依靠先天的语言习得装置及普遍语法、后天有限的语料输入以及丰富的社会互动与具身体验;而机器学习则立足于庞大的数据集、先进的深度学习算法和强大的计算资源。虽然最新的大语言模型在语言生成任务上表现出色,但其对语言的理解通常局限于模式识别,缺乏深层认知。同时,机器模型在语言学习过程中面临着与人类儿童相似的挑战:如何从不完美的数据输入中提炼出语言的深层结构。
In the era of generative artificial intelligence,understanding the nature and origin of machines’knowledge of language has become a critical inquiry across philosophy,linguistics,and artificial intelligence.This paper aims to explore the fundamental differences between advanced large language models and human children in core dimensions of language learning by comparing their language acquisition mechanisms.Children acquire language through an innate language acquisition device and universal grammar,supplemented by relatively limited postnatal language input alongside rich social interactions and embodied experiences.In contrast,machine learning is based on large datasets,advanced deep learning algorithms,and powerful computing resources.Despite their proficiency in language generation tasks,current language models primarily excel in pattern recognition,lacking deeper cognitive understanding.Similar to children,language models face challenges in extracting the intricate deep structure of language from imperfect input data.
作者
李金彩
李鸾
陶亮
LI Jincai;LI Luan;TAO Liang(School of Foreign Languages,Shanghai Jiao Tong University,Shanghai,200240;National Research Centre for Language and Well-being,Shanghai,200240;Faculty of Business Information,Shanghai Business School,Shanghai,201400)
出处
《自然辩证法通讯》
CSSCI
北大核心
2024年第11期1-11,共11页
Journal of Dialectics of Nature
基金
国家社会科学基金青年项目“实验语言哲学视域下的专名指称机制建构”(项目编号:21CZX065)
国家社会科学基金青年项目“基于词汇语义网络的儿童语义发展及数据库研究”(项目编号:23CYY040)。
关键词
生成式人工智能
机器语言
儿童语言
语言知识
柏拉图问题
Generative artificial intelligence
Machine language
Child language
Knowledge of language
Plato's problem