摘要
深度学习在事件检测任务上取得了显著的成果,但模型严重依赖于大量的标注数据.由于事件结构化的信息和丰富的标签表示,使得获取注释的成本很高,难以大量获得.针对事件检测任务,为了提高语料标注效率,减少训练过程所需的标注样本数量,提出一种联合主动学习和预训练模型的事件检测模型.针对主动学习模型存在的冷启动问题,设计了基于融合不确定性的特殊样本选择策略,估计样本在微调下游事件检测任务方面的潜在贡献.一方面,结合预训练模型从原始任务中带来的丰富的语义信息,避免了重新设计网络结构或从零开始训练;另一方面,利用主动学习选择信息丰富的样本能更好地微调预训练模型,减少数据标注成本.在ACE 2005语料上进行数值实验验证,结果证明了所提出的EDPAL算法的有效性.
With the rapid growth of network information,it has become more and more important to find the key information.Event detection focuses on extracting event triggers from unstructured natural language texts.Deep learning has achieved a great success in event detection tasks,but the model relies on a large amount of labeled data which are difficult to be obtained.And the cost of obtaining annotations is very high due to the structured information of the event and the rich label representation.To address these issues,this paper proposes a joint active learning and pre-trained event detection model(EDPAL).To handle the cold start problem of the active learning,a special sample selection strategy on the basis of fusion uncertainty is designed to estimate the potential contribution of samples in fine-tuning downstream event detection tasks.On the one hand,combined with the rich semantic information brought by the pre-training model from the original task,it avoids redesigning the network structure or training from scratch.On the other hand,the pre-training model can be better fine-tuned by selecting information-rich samples and reduce the cost of data labeling at the same time.The experimental results on the ACE 2005 corpus shows the effectiveness of the proposed EDPAL.
作者
冯琳慧
乔林波
阚志刚
Feng Linhui;Qiao Linbo;Kan Zhigang(National Laboratory for Parallel and Distributed Processing,National University of Defense Technology,Changsha 410073,China)
出处
《南京师范大学学报(工程技术版)》
CAS
2022年第2期41-47,共7页
Journal of Nanjing Normal University(Engineering and Technology Edition)
关键词
主动学习
事件检测
预训练模型
样本选择策略
微调
active learning
event detection
pre-trained model
selecting strategy
fine-tuning