摘要
事件抽取旨在将非结构化自然语言文本中的事件信息以结构化形式进行识别提取。传统事件抽取方法抽取范围局限于单个句子,且依赖较大规模的标注数据,在篇章级抽取任务与低资源目标领域中表现不佳。现有研究利用提示学习方法,以模板槽位填空方式实现篇章级事件抽取,其缺点在于传统提示模板槽位对论元角色分类准确度不高,容易造成论元角色抽取错误。针对上述问题,提出一种基于槽位语义增强提示学习的篇章级事件抽取方法,在提示学习方法的基础上,将传统事件抽取范式中的论元角色语义信息融入提示模板槽位中,为模型的槽位预测生成环节提供论元类型约束,提高篇章级事件抽取的准确率。通过使预训练语言模型上下游任务保持一致,提高模型的泛化能力,同时以较低成本实现知识迁移,在低资源事件抽取场景下提升模型性能。实验结果表明,相较于表现次优的传统基线方法,在包含59种论元类型的英文事件抽取数据集、包含92种论元类型的中文数据集以及低资源数据规模下,该方法的F1值分别取得了2.6、2.9和4.0个百分点的提升。
Event extraction aims to recognize and extract event information from unstructured natural language texts in a structured form.Traditional methods extract events at the sentence level,relying on massive labeled data for training,which are unqualified for document-level event extraction and lack performance in low-resource scenarios.Existing research utilizes prompt learning methods to achieve document-level event extraction by filling in template slots.However,traditional prompt template slots have low accuracy in classifying argument roles,which can easily lead to errors in argument role extraction.To address the above issues,this paper proposes a document-level event extraction method based on slot semantic enhancement prompt learning.Based on the prompt learning method,the argument role semantic information in the traditional event extraction paradigm is integrated into the slot of the prompt template,providing argument type constraints for the slot prediction generation process of the model and improving the accuracy of document-level event extraction.By keeping the upstream and downstream tasks of the pretrained language model consistent,the generalization ability of the model is improved,and knowledge transfer is achieved at a lower cost to improve model performance in low-resource event extraction scenarios.Experimental results show that compared to the traditional baseline method with suboptimal performance,this method achieved an F1 score improvement of 2.6,2.9,and 4.0 percentage points on an English event extraction dataset containing 59 argument types,Chinese dataset containing 92 argument types,and low-resource data scale,respectively.
作者
李鸿鹏
马博
杨雅婷
王磊
王震
李晓
LI Hongpeng;MA Bo;YANG Yating;WANG Lei;WANG Zhen;LI Xiao(Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China;University of Chinese Academy of Sciences,Beijing 100049,China;Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2023年第9期23-31,共9页
Computer Engineering
基金
中国科学院青年创新促进会项目(科发人函字[2019]26号)
中国科学院特色科学数据库建设项目(CASWX2021SF031)
新疆天山创新团队项目(2020D14045)
新疆维吾尔自治区自然科学基金重点基金项目(2022D01D81,2022D01D04)。
关键词
事件抽取
提示学习
信息抽取
自然语言处理
预训练语言模型
event extraction
prompt learning
information extraction
natural language processing
pretrained language model