摘要
事件抽取是自然语言处理中信息抽取的关键任务之一。事件检测是事件抽取的第一步,事件检测的目标是识别事件中的触发词并为其分类。现有的中文事件检测存在由于分词造成的误差传递,导致触发词提取不准确。将中文事件检测看作序列标注任务,提出一种基于预训练模型与条件随机场相结合的事件检测模型,采用BIO标注方法对数据进行标注,将训练数据通过预训练模型BERT得到基于远距离的动态字向量的触发词特征,通过条件随机场CRF对触发词进行分类。在ACE2005中文数据集上的实验表明,提出的中文事件检测模型与现有模型相比,准确率、召回率与F1值都有提升。
Event extraction is one of the key tasks of information extraction in natural language processing.Event detection is the first step of event extraction and aims to identify and classify trigger words in an event.The existing Chinese event detection has error transfer caused by word segmentation,which leads to inaccurate extraction of trigger words.In this paper,Chinese event detection is regarded as a sequence tagging task,and an event detection model based on pre-training model and conditional random field is proposed.Firstly,the BIO annotation method is used to annotate the data.Then,the training data are obtained through the pre-training model BERT to obtain the trigger words characteristics based on the long-distance dynamic word vector.Finally,the trigger words are classified by conditional random field.Experiments on the ACE2005 Chinese corpus show that the accuracy,recall rate and F1 value of the Chinese event detection model proposed in this paper outperform other existing event detection models.
作者
田梓函
李欣
TIAN Zihan;LI Xin(College of Information Network Security,People’s Public Security University of China,Beijing 100038,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第11期135-139,共5页
Computer Engineering and Applications
基金
国家重点研发计划(2017YFC0803700)
中国人民公安大学2019年度基本科研业务费(2019JKF424)。