摘要
新闻要素信息抽取是指从新闻文本中识别出人名、地名、领域要素等信息,对于快速理解新闻文本有着关键作用。本文以抽取涉案新闻领域的要素信息为例,提出基于门控图神经网络(Gated Graph Neural Network,GGNN)融合案件相关词典的方法,利用图神经网络中结点与边之间的消息传播机制将外部词汇知识融入新闻文本中,挖掘文本潜在的语义特征,提高要素信息抽取性能。首先根据新闻文本特征选择领域相关的词汇构建案件相关词典,其次利用新闻文本和词典构建字粒度的组合图,通过GGNN模型对其进行编码得到字词组合关系的表征,最后利用Bi-LSTM-CRF模型解码得到要素信息序列。在标注的涉案新闻要素信息数据集上的实验结果表明,基于GGNN融入词典信息的要素抽取方法与常用的算法模型相比,F1值有2.12%~5.34%的提高,取得了更稳定的性能。
Information extraction of news elements refers to identifying information such as the name of person,the name of location,and the name of organization from news texts.It is the basic content of natural language processing and plays a crucial role in intelligent question answering,information retrieval,and knowledge graph construction.Taking the extraction of case-related element information in the news as an example,a method based on Gated Graph Neural Network(GGNN)is proposed.By integrating it into news text,mine the latent semantic features to improve the performance of feature information extraction.Specifically,the domain-related vocabulary is built according to the characteristics of the news text to construct a domain dictionary,and then the char-grained combination graph is constructed by using the news text and domain dictionary,then the GGNN model is used to encode the graph to obtain the charword combination relationship representation.The bidirectional Long Short Term Memory Network(Bi-LSTM)and Conditional Random Fields(CRF)models are used to decoded the element information sequence.The experimental results on the annotated news element information data set show that,compared with the baseline model,the F1 value of the element extraction method based on GGNN integrated with dictionary information has improved by 2.12%~5.34%,and achieved more stable performance.
作者
党雪云
王剑
DANG Xueyun;WANG Jian(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming 650504,China)
出处
《电视技术》
2022年第5期24-29,共6页
Video Engineering
基金
国家重点研发计划(No.2018YFC0830105,No.2018YFC0830101,No.2018YFC0830100)。