期刊文献+

语言学知识驱动的空间语义理解能力评测数据集研究

SpaCE:A Linguistic Knowledge-Driven Benchmark for Spatial Cognition Evaluation
下载PDF
导出
摘要 近20年来,深度学习技术显著提升了机器的自然语言处理能力,使之在诸多任务上接近甚至超过人类水平。机器学习的对象不再是直接来自人类语言学研究成果(知识),而是人类语言材料(数据)。在靠数据和算力驱动的大语言模型几近建成巴别塔的当下,语言学家通过深挖语言现象总结的语言学知识价值何在?本文提出从知识到数据的研究思路,设计了空间语义理解的6项任务:空间信息正误判别、异常空间信息识别、缺失参照成分补回、空间语义角色标注、空间表达异形同义判别、空间方位关系推理,以构建中文空间语义理解能力评测数据集为例,介绍从SpaCE2021到SpaCE2024数据集的设计思想、数据集制作概况以及机器在空间语义理解任务上的表现。总的来看,参加SpaCE赛事的大语言模型,在依赖表面分布特征(形式线索)的任务上容易获得好成绩,在依赖深层语义理解(认知能力)的任务上容易表现不好。因此,在人工智能高速发展使得语言学知识在计算机信息处理领域被动边缘化的当下,语言学知识的价值需要拓展,即用于指导小而精的高品质语言数据,以提升机器学习的效果和效率。为了计算应用的目的,语法研究应该在观察充分、描写充分、解释充分之上,追求更具挑战性的目标——生成充分。 Over the past two decades,deep learning technology has propelled machine natural language processing capabilities to rival or even surpass human levels in many tasks.Machine learning does not directly utilize the outcomes of human linguistic research(knowledge),but rather from human language materials(data).This situation should garner signifi cant attention from linguists.As large language models,driven purely by data and computational power,have nearly constructed a modern Tower of Babel,the question of how to realize the value of linguistic knowledge through in-depth exploration of specifi c and subtle language phenomena looms large over every linguistic researcher.This paper proposes a research approach that generates text data from linguistic knowledge for evaluating machine understanding of spatial semantics.Over the past four years,we have organized four consecutive competitions on Chinese Spatial Cognition Evaluation(SpaCE):from SpaCE2021 to SpaCE2024,including 6 sub-tasks:Determination of Spatial Information Validity,Detection of Spatial Anomalies,Recovery of Spatial References,Identification of Spatial Semantic Roles,Recognition of Spatial Equivalences,and Spatial Position Reasoning.This paper introduces the design philosophy,dataset creation process,dataset overview,and the performance characteristics of machines in SpaCE tasks.Overall,large language models participating in the SpaCE competitions perform relatively well on tasks that rely on surface distribution features,that is,tasks with formal cues,but poorly on tasks that depend on deep semantic understanding,that is,tasks requiring cognitive abilities.In the current era of rapid AI development,where linguistic knowledge is passively marginalized in the fi eld of natural language processing,the value of linguistic knowledge needs to be redefi ned.It should be used to guide the production of small,high-quality language data to enhance the eff ectiveness and effi ciency of machine learning.For computational applications,grammatical research should pursue more challenging goals-adequate generation-beyond the objectives of adequate observation,description,and explanation.
作者 詹卫东 孙春晖 肖力铭 Zhan Weidong;Sun Chunhui;Xiao Liming
机构地区 北京大学中文系
出处 《语言战略研究》 北大核心 2024年第5期7-21,共15页 Chinese Journal of Language Policy and Planning
基金 教育部人文社会科学重点研究基地重大项目“面向机器语言能力评测的综合型语言知识库研究”(22JJD740004)。
关键词 人工智能 大语言模型 语言学知识 空间语义理解 数据合成 artificial intelligence large language models linguistic knowledge spatial semantic understanding data synthesis
  • 相关文献

参考文献3

二级参考文献1

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部