摘要
远程监督为关系抽取任务提供了大量自动标注的数据集,且领域迁移性强,为实现自动抽取奠定了基础。然而,构造的数据集伴随了强约束性的假设,存在着严重的错误标签问题,且这些噪声数据极大影响了最终的性能结果。为了缓解错误标注的问题,文中提出了一种双重注意力模型:第一层注意力机制通过在句子编码部分引入TransH预训练好的实体向量,与句子特征共同进行注意力选择,为体现关系信息的特征分配更高的权重来提高句子编码质量;第二层则是从句子级别再进行一次注意力计算,挑选出有效的实例,进一步降低噪声数据的权重。通过在广泛使用的数据集上进行多次对比实验,表明文中所提模型可以充分利用所有信息资源,结果明显优于其他基线模型。
Distant supervision for relation extraction provides automatically labeled data sets and has strong domain migration.It lays a foundation for the realization of automatic relationship extraction.However,the constructed data sets bear with some deficiencies,including strong restrictive assumptions,seriously⁃wrong labels that can greatly affect the final performance.In order to alleviate the influence of wrong labels,a dual attention model is proposed in this paper.The first layer of the attention mechanism introduces pre⁃trained entity vectors based on TransH into the part of sentence coding,and selects features together with sentence features.Then,it assigns higher weights to the features that reflect the relational information.The first layer improves the quality of sentence coding.The second layer is to select effective examples by another attention calculation at the sentence level to further reduce the weight of noise data.Through a series of comparative experiments on a widely⁃used dataset,we find that our model can make full use of all information resources,and obtain significantly better results than other baseline models.
作者
季一木
汤淑宁
刘尚东
张旺
洪程
邱晨阳
刘强
肖婉
JI Yimu;TANG Shuning;LIU Shangdong;ZHANG Wang;HONG Cheng;QIU Chenyang;LIU Qiang;XIAO Wan(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Institute of High Performance Computing and Bigdata,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Nanjing Center of HPC China,Nanjing 210023,China;Jiangsu Research Engineering of HPC and Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;College of Educational Science and Technology,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《南京邮电大学学报(自然科学版)》
北大核心
2022年第6期70-78,共9页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家重点研发计划专项(2018AAA0103302)
江苏省自然科学及高校自然科学重大项目(BK20170900,19KJB520046,20KJA520001)
江苏省创新创业人才项目
江苏博士后基金(2019K024)
江苏省六大人才高峰项目(JY02)
江苏省博士后研究实践创新项目(KYCX19_0921,KYCX19_0906)
之江实验室开放项目(2021KF0AB05)
南京邮电大学鼎山人才培养对象项目和南京邮电大学人才引进启动基金(NY219132)资助项目