摘要
传统的文档级关系抽取方法在特征表示的有效性和噪声消除方面存在局限,不能准确地找出证据句子和实体对的关系。为了进一步提升文档级关系抽取和证据句子抽取的准确性,该文提出了一种使用小波变换对预训练语言模型生成的文本向量进行特征提取、清洗和去噪处理的方法。首先利用预训练语言模型对文档进行编码,将得到的初始文本向量应用小波变换出更精确的特征,其次引入多头注意力机制对小波变换的数据进行加权处理,以凸显与实体对关系相关的重要特征。为了充分利用原始数据和清洗后的数据,采用残差连接的方式将它们进行融合。在DocRED数据集上对模型进行了实验,结果表明,该文所提模型能够更好地抽取实体对的关系。
Traditional methods of document-level relation extraction have limitations in the effectiveness of feature representation and noise elimination.To address this issue,this paper proposes a method that utilizes wavelet transform to extract,clean,and denoise text vectors generated by pre-trained language models.Firstly,the document is encoded by a pre-trained language model,and the obtained initial text vectors are applied to wavelet transform to obtain more precise features.Next,a multi-head attention mechanism is introduced to weight the data from wavelet transform,highlighting the important features relevant to entity relationships.To fully utilize both original and cleaned data,a residual connection is employed to fuse them together.Experiment on the DocRED dataset demonstrate that the proposed method performs better in extracting relationships between entity pairs.
作者
杨肖
肖蓉
YANG Xiao;XIAO Rong(School of Computer Science and Information Engineering,Hubei University,Wuhan,Hubei 430062,China)
出处
《中文信息学报》
CSCD
北大核心
2024年第2期109-120,131,共13页
Journal of Chinese Information Processing
基金
湖北省自然科学基金(E1KF291005)
云南省自然科学基金(2022KZ00125)。
关键词
文档级关系抽取
小波变换
多头注意力机制
document-level relationship extraction
wavelet transform
multi-head attention mechanism