摘要
随着互联网信息技术高速更新迭代,新闻文本信息在以指数级的速度增多。面对海量的新闻文本信息,如何自动提取长篇新闻文本中要素与要素之间的关系,成为研究的重点。篇章级新闻要素关系抽取是指从篇章级新闻文本中跨句子识别要素之间的关系信息,有助于加速人们对整篇新闻文本脉络的理解。本文以舆情新闻文本为例,提出融入多特征的篇章级新闻要素关系抽取方法,通过异构图模型将句子间的邻接关系、从属关系、句法依赖关系、要素间的多跳关系等多种特征进行融合,充分挖掘文本中潜在的上下文信息。在构建的篇章级舆情新闻要素关系数据集上的实验结果表明,融入的多种特征对要素关系抽取的性能均有明显的提升,F1值最高提升了4.09%,较目前主流方法取得了更好的效果。
With the rapid update and iteration of Internet information technology, Internet news text information is also increasing at an exponential rate. In the face of massive news text information, how to automatically extract the relation between elements of long news texts has become the focus of research. The extraction of discourse-level news element relation refers to extracting the relation information between elements across sentences from news texts contained multiple sentences, which helps people to understand the context of the whole news text. Taking the extraction of element relation in the field of public opinion news as an example, proposes a method for extracting the relation between discourse-level news elements that incorporates multi-features, the adjacent relations,affiliation, syntactic dependency relation between sentences, and multi-hop relations between elements into a heterogeneous graph model, fully mine the potential context information of news. The experimental results on the discourse-level news element relation dataset show that multi-features can improve the extraction performance obviously, and the F1 value can be improved by 4.09% at most,which is better than the current mainstream method.
作者
党雪云
王剑
DANG Xueyun;WANG Jian(Facully of Information Engineering and Automation unming Universily of ScienceandTechnology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming 650500,China)
出处
《电视技术》
2022年第6期73-78,共6页
Video Engineering
基金
国家重点研发计划(No.2018YFC0830105
No.2018YFC0830101
No.2018YFC0830100)。
关键词
舆情新闻文本信息
篇章级要素关系抽取
异构图模型
discourse-level element relation extraction
heterogeneous graph
public opinion news text information