期刊文献+

面向煤矿机电设备领域的三元组抽取方法

Triplet extraction method for mine electromechanical equipment field
下载PDF
导出
摘要 针对机电设备领域相关语料匮乏、关系类型特征挖掘不充分以及文本包含重叠三元组的问题,提出一种融合提示学习与先验知识以迭代式对抗训练的三元组抽取方法TBPA(Triplet extraction Based on Prompt and Antagonistic training)。首先,利用BERT(Bidirectional Encoder Representations from Transformers)模型在自构语料库上进行微调,以获取输入文本的特征向量;接着,采用投影梯度下降(PGD)方法在嵌入层进行迭代式对抗训练,提高模型对干扰样本的抵御能力和对真实样本的泛化能力;然后,利用单层头尾指针网络识别出头实体,并结合提示学习模板获取头实体对应的领域先验特征,将字向量与Prompt模板中预测得到的提示向量相结合;最后,在分层标注框架下,使用单层头尾指针网络逐个识别预定义的所有关系类型所对应的尾实体。与基线模型CasRel相比,TBPA在精确率、召回率和F1值上分别提高了3.10、6.12、4.88个百分点。实验结果表明,TBPA在煤矿机电设备领域三元组抽取任务中具有一定的优势。 To address the challenges of scarce domain-specific corpora,insufficient feature mining of relation types,and the presence of overlapping triplets in texts for electromechanical equipment domain,a triplet extraction method TBPA(Triplet extraction Based on Prompt and Antagonistic training)based on prompt learning with prior knowledge through iterative adversarial training was proposed.Firstly,the BERT(Bidirectional Encoder Representations from Transformers)model was fine-tuned on a self-constructed corpus to obtain feature vectors for input text.Then,an iterative adversarial training using the Projection Gradient Descent(PGD)method was conducted at the embedding layer to enhance the model’s resistance to perturbed samples and generalization ability to real samples.Furthermore,a single-layer head-tail pointer network was used to identify the head entity,and domain-specific prior features corresponding to the head entity were obtained by incorporating the word vectors with the prompt vectors predicted by the prompt learning templates.Finally,within a hierarchical annotation framework,another single-layer head-tail pointer network was employed to sequentially identify the tail entities associated with predefined relation types.In comparison with the baseline model CasRel,TBPA achieves improvements of 3.10,6.12 and 4.88 percentage points in precision,recall,and F1 score,respectively.Experimental results demonstrate its advantages in triplet extraction tasks within the domain of mine electromechanical equipment.
作者 游新冬 问英姿 佘鑫鹏 吕学强 YOU Xindong;WEN Yingzi;SHE Xinpeng;LYU Xueqiang(Beijing Key Laboratory of Network Culture and Digital Communication(Beijing Information Science and Technology University),Beijing 100101,China)
出处 《计算机应用》 CSCD 北大核心 2024年第7期2026-2033,共8页 journal of Computer Applications
基金 国家语委项目(ZDI145-10) 北京市自然科学基金资助项目(4212020) 华能集团总部科技项目(HNKJ21-HF43)。
关键词 煤矿机电设备 三元组抽取 提示学习 迭代式对抗训练 自构语料库 mine electromechanical equipment triplet extraction prompt learning iterative adversarial training selfconstructed corpora
  • 相关文献

参考文献5

二级参考文献39

共引文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部