摘要
在中文事件检测任务中,存在着领域互相独立,领域间的数据无法互通,需要分别为每个领域标注大量数据的问题。本文充分借鉴前人的研究,提出一个基于迁移学习的开放式中文事件检测方法。首先基于两个触发词关联假设,一个是同一个事件类型下,触发词与触发词在语义空间上有着一定的关联,且关联性较强。第二个是不同事件类型之间的触发词和触发词之间也存在着一定的关联,不过其关联性弱于相同事件类型下触发词之间的关联性。之后借助外部词典,构建候选词与种子触发词的关系特征以及候选词的上下文特征,再利用卷积神经网络构建事件检测的基础模型和迁移模型。最后对于新领域下的事件检测,只需要借助极少量的已知领域的标注数据即可完成。在ACE2005的中文事件数据集上,该方法在触发词识别这项任务上仅用20%的数据,其效果即可超越当前的主流方法。
In the task of Chinese event detection, there is a problem that domains are independent from each other, and data among domains can not be exchanged. It is necessary to label a large number of data for each domain. Based on previous studies, an open Chinese event detection method based on transfer learning is proposed in this paper. Two association hypotheses of trigger words are studied. The first one is that under the same event type, trigger words are strongly relevant in semantic space with each other. The other one is that among different event types, trigger words are also related with each other, but their relationship are weaker than those under the same event type. Based on the hypotheses, the relationship between candidate words and seed trigger words and the contextual features of candidate words are constructed with the help of external dictionaries. Then,the basic model and the transfer model of event detection are constructed by using convolutional neural network. Finally, only a small amount of tagged data is needed to detect events in the new domain. On ACE2005 Chinese event data set, this method only uses 20% of the data for trigger word recognition,and its effect can surpass the current mainstream method.
作者
严浩
许洪波
沈英汉
程学旗
YAN Hao;XU Hongbo;SHEN Yinghan;CHENG Xueqi(Key Laboratory of Network Data Science and Technology,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2020年第2期64-71,共8页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家重点研发计划项目(2016QY03D0504)。
关键词
事件检测
迁移学习
触发词
卷积神经网络
event extraction
transfer learing
seed
convolutional neural network