摘要
信息抽取技术用于从非结构化文本数据中提取关注度较高的信息。事件抽取技术是信息抽取研究领域中具有挑战的研究方向。事件抽取的目的是从非结构化文本数据中抽取描述事件的关键元素,并以结构化的方式呈现。事件抽取被看作序列标注任务,首先采用ALBERT预训练模型学习特征,其次引入条件随机场CRF模型提高序列标注性能,最后完成事件类型以及事件要素的识别分类。在ACE2005标准语料库上的实验结果表明,与现有模型相比,ALBERT-CRF模型在触发词识别和分类任务上的召回率和F值均有所提高。
Information extraction technology is used to extract the information with high attention from unstructured text data.Event extraction technology is a challenging research direction in the field of information extraction.The purpose of event extraction is to extract key elements describing events from unstructured text data and present them in a structured way.Event extraction is regarded as a sequence annotation task.Firstly,the ALBERT pre-trained model is used to learn the features.Then,conditional random field is introduced to improve the sequence annotation performance.Finally,the identification and classification of event types and event elements are completed.The experimental results on ACE2005 standard corpus show that,compared with the existing models,ALBERT-CRF model improves the recall rate and F-score in trigger word recognition and classification tasks.
作者
杜洁
骆力明
孙众
DU Jie;LUO Li-ming;SUN Zhong(College of Information Engineering,Capital Normal University,Beijing 100048,China)
出处
《计算机工程与科学》
CSCD
北大核心
2023年第4期711-717,共7页
Computer Engineering & Science
基金
国家自然科学基金(61977048)。