摘要
事件识别,包括事件触发词识别和分类,是事件抽取任务中的基础问题.为了利用较为丰富和完善的英文事件语料库来帮助完成中文事件抽取任务,提出了一种基于联合学习的跨语言事件识别方法,即利用源语言的标注语料对目标语言的测试语料进行事件识别.利用机器翻译及词对齐技术来保持源语言和目标语言的语言一致性和标注信息一致性.挑选合适的特征组合,使用最大熵分类模型分别实现触发词的识别和分类.通过整数线性规划的联合学习模型将二者结合在一起,加之局部约束和全局约束条件,对结果进行优化处理.实验结果表明,使用源语言的语料及其翻译语料叠加的双语语料时,所用方法可以取得较好的效果.
Event recognition is a basic task of event extraction, which include trigger identification and trigger classification.English event corpus is better and more abundant to help Chinese event extraction.A cross lingual event recognition method was proposed to use joint modeling.Specifically, machine translation and word alignment technologies were applied to contain the consistency of corpus language and annotation information.Then a Maxent model was trained to get trigger identification and trigger classification results with appropriate features.Lastly, trigger identification and trigger classification were fused to optimize the results through integer liner programming, with local constraints and global constraints.The results of experiments showed that the proposed method was effective, especially using the bilingual corpus which contained the origin corpus and its translation corpus simultaneously.
出处
《郑州大学学报(理学版)》
CAS
北大核心
2017年第2期60-65,共6页
Journal of Zhengzhou University:Natural Science Edition
基金
国家自然科学基金重点项目(61331011)
国家自然科学基金项目(61375073
61273320)
关键词
事件识别
跨语言
联合学习
整数线性规划
event recognition
cross lingual
joint modeling
integer liner programming