摘要
在公检法、纪检监察等领域的大数据分析中,结构化数据和非结构化文本数据往往成为主要数据源.基于这类数据进行业务分析时,需要重点提取数据背后的隐型关联,而事件抽取是对此类文本数据进行关联分析的核心基础.过往事件抽取任务将事件触发词识别和事件要素识别分开进行,由事件触发词识别得到的事件触发词及事件类型进行后续的事件要素识别,存在误差传播的问题,且以往的基于表示的方法构建的词向量,对于句子级特征的提取能力存在缺失.提出了一种RBBLC联合抽取模型,以序列标注的方式同时完成事件识别和事件要素识别.所提RBBLC模型基于RoBERTa构建包含更丰富上下文信息的词向量,继而应用BiLSTM-CNN的网络结构捕捉语句内部关联信息进行事件触发词及论元标签预测和事件类型预测.在CEC语料库上进行了抽取实验和归纳分析,本方法的F1值、准确率、召回率三项指标较基线方法分别提高了16%、28%和24%,有效提升了事件抽取任务性能.
In big data analysis in the field of public security and law,discipline inspection and supervision,structured data and unstructured text data often become the main data source.When conducting business analysis based on this type of data,it is necessary to focus on extracting the implicit associations behind the data,and event extraction is the core basis for association analysis of such text data.The past event extraction task separates event trigger word recognition and event element recognition.The event trigger word and event type obtained from the event trigger recognition are used for subsequent event element recognition.There is a problem of error propagation,and the previous representation-based method is constructed Word vectors lack the ability to extract sentence-level features.This paper proposes a RBBLC joint extraction model,which completes event recognitionand event element recognition at the same time by means of sequence labeling.The RBBLC model builds word vectors containing richer context information based on RoBERTa,and then uses the network structure of BiLSTM-CNN to capture the relevant information within thesentence for event trigger word and argumentlabelprediction and event type prediction.The experiment is carried out on the CEC corpus.Compared with the baseline method,the F1 value,accuracy rate,and recall rate of our method are improved by 16%,28%and 24%respectively,which is effective improved the performance of event extraction tasks.
作者
杨登辉
刘靖
Yang Denghui;Liu Jing(College of Computer Science,Inner Mongolia University,Hohhot 010021,China)
出处
《南京师范大学学报(工程技术版)》
CAS
2022年第3期38-44,82,共8页
Journal of Nanjing Normal University(Engineering and Technology Edition)
基金
国家自然科学基金资助项目(61662051)
内蒙古科技计划项目(2019GG372)
内蒙古纪检监察大数据实验室开放课题项目(IMDBD202005)。