摘要
文档级关系抽取是指在非结构性文本中抽取实体对之间的关系。针对当前文档级关系抽取方法未能充分利用文档语义信息且难以处理文档的噪声干扰问题,提出一种基于双粒度文档图的关系抽取模型,采用一种新型的构图思路以及降噪方法,分别在句间和句内两个层面进行设计。首先,在句间层面使用修辞语篇关系实体提及关系构建修辞语篇关系图RST-graph,采用异步降噪方式生成粗粒度文档图(CGD-graph),缓解了因实体对的句间关系路径长于句内关系路径造成的结构性误剪枝问题。然后,在句内层面采用依存句法关系对文档中的句子进行解析,构造依存句法树(SDT),增强句内语义信息。最后,将SDT和CGD-graph中存在的公共锚点相连接,构造细粒度文档图(FGD-graph)。实验结果表明,与去噪图推理(DGI)模型相比,该模型的lgn F1值和F1值分别提升了0.40和0.51个百分点,并且在实体对的多标签关系上随着标签数量的增多抽取效果提升较为显著。
This study proposes a document-level relation extraction model that addresses the insufficient utilization of document semantics and difficulty in handling noise in unstructured text.The model is based on dual-granularity document graphs and employs a novel graph construction approach along with a noise reduction technique designed at both the inter-and intra-sentence levels.At the inter-sentence level,a rhetorical discourse relation graph,RST-graph,is constructed using rhetorical discourse and entity mention relations,and a Coarse-Grained Document graph(CGD-graph)is generated using an asynchronous noise reduction method.This approach prevents structural mispruning caused by longer inter-sentence relation paths compared with intra-sentence paths.At the intra-sentence level,dependency syntax relations are used to parse sentences in a document,forming a Dependency Syntax Tree(DST)to enhance intra-sentence semantic information.Finally,the DST is connected to the common anchor points in the CGD-graph to form a Fine-Grained Document graph(FGD-graph).Experimental results indicate that compared with the Denoising Graph Inference(DGI)model,the proposed model improves the lgn F1 and F1 value by 0.40 and 0.51 percentage points,respectively.Additionally,it demonstrates a significant improvement in extracting multi-label relations as the number of labels increases.
作者
廖涛
张国畅
张顺香
LIAO Tao;ZHANG Guochang;ZHANG Shunxiang(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,Anhui,China;Artificial Intelligence Research Institute of Hefei Comprehensive National Science Center,Hefei 230000,Anhui,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2024年第10期164-173,共10页
Computer Engineering
基金
国家自然科学基金面上项目(62076006)
安徽省高校协同创新项目(GXXT-2021-008)。
关键词
文档级
关系抽取
双粒度文档图
异步降噪
修辞语篇关系
依存句法关系
document-level
relation extraction
dual-granularity document graph
asynchronous noise reduction
rhetorical discourse relation
dependency syntax relation