摘要
在当前互联网时代,大量新领域下的非结构文本数据中蕴含了海量信息。面向新领域的事件抽取方法研究能快速地构建领域知识库,用于支撑基于知识的下游应用。但现有事件抽取系统的领域限定性强,在新领域中从零构建会极度依赖事件体系和标注数据的质量及规模,需要大量人力和专家知识来定制模板和标注语料。而且数据集中常见在相同的上下文中出现多个相关联的事件实例,对事件抽取和真实性检测产生了极大阻碍。本文针对面向新领域的事件抽取这一新兴研究领域进行综述,从事件模板推导、多实例联合事件抽取、事件真实性检测三个研究方向介绍了相关工作的研究现状,并对目前存在的重点和难点问题进行了讨论,指出了下一步需要开展的研究工作。
In the current Internet era,numerous unstructured text data in new domains often contain high-volume information.Studies on event extraction in new domains can accelerate building of domain knowledge bases,supporting downstream knowledge-based applications.However,the existing event extraction methods have substantial limitations of the domain.Building event extraction systems from scratch in new domains will heavily depend on the quality and scale of event schemas and annotated data,requiring a lot of human efforts and expertise.Moreover,it is common in the datasets that multiple associated event instances often appear in the same context,heavily hindering event extraction and factuality prediction.This paper summarizes the emerging research field of event extraction in new domains and investigates current research status from three directions:event schema induction,collective event extraction,and event factuality prediction.In addition,this paper discusses the existing difficulties and challengings and indicates the potential research work to be carried out in the future.
作者
黄河燕
刘啸
HUANG Heyan;LIU Xiao(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;Beijing Engineering Research Center of High-Volume Language Information Processing and Cloud Computing Applications,Beijing 100081,China;Southeast Academy of Information Technology,Beijing Institute of Technology,Putian 351100,China)
出处
《智能系统学报》
CSCD
北大核心
2022年第1期201-212,共12页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(U19B2020).
关键词
事件抽取
新领域
信息抽取
事件模板推导
联合抽取
事件真实性检测
自然语言处理
知识库
event extraction
new domains
information extraction
event schema induction
collective extraction
event factuality prediction
natural language processing
knowledge base