期刊文献+

面向政府公文的关系抽取方法研究 被引量:3

Research on Relation Extraction Method for Government Documents
下载PDF
导出
摘要 政府公文内容多,涉及范围广,从中挖掘出有价值的信息,可减轻政府工作人员的压力,比如应用实体关系抽取技术挖掘人事信息。采用远程监督的关系抽取方法可以减少人工标注成本,提高关系抽取效率,进而保证了获取重要信息的质量和实效性。该文提出一种ALBERT预训练语言模型和胶囊网络相结合的远程监督实体关系抽取方法,抽取公文中的人名职务关系。ALBERT通过字嵌入和位置嵌入的方式,提取文本中深层的语义信息,胶囊网络通过传输低层到高层的特征,提高关系分类效果。实验结果表明,提出的关系抽取模型的准确率、召回率、F1值均高于基线方法,能够有效提高关系抽取性能,解决公文领域标注数据集少的问题。该方法所获实例可扩充现有公文领域知识库,可以辅助政府工作人员在书写公文时快速获取人事信息,避免信息传递错误。 Government documents contain rich contents and cover a wide range.Mining valuable information from them can relieve the pressure on staffs,such as using entity relationship extraction technology to mine personnel information.The method of distant supervision for relation extraction can reduce the cost of manual labeling,improve the efficiency of relation extraction,and ensure the quality and effectiveness of obtaining important information.We propose a method of distant supervision for entity relation extraction based on combining ALBERT pre-training language model with capsule network to extract the person names and positions relationship in the official documents.ALBERT extracts the deep semantic information from the text by way of word embedding and position embedding.Capsule network improves relationship classification by transferring low-level to high-level features.The experiment shows that the accuracy,recall rate and F1 value of the proposed relationship extraction model are higher than the baseline method,which can effectively improve the performance of relation extraction and solve the problem of fewer labeled datasets in the field of official documents.The examples obtained in this paper can expand the existing document domain knowledge base,and help government staffs to quickly obtain personnel information when writing documents,so as to avoid information transmission errors.
作者 崔从敏 施运梅 袁博 李云汉 李源华 周楚围 CUI Cong-min;SHI Yun-mei;YUAN Bo;LI Yun-han;LI Yuan-hua;ZHOU Chu-wei(Beijing Key Laboratory of Internet Culture Digital Dissemination,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Information Science and Technology University,Beijing 100101,China)
出处 《计算机技术与发展》 2021年第12期26-32,共7页 Computer Technology and Development
基金 国家重点研发计划项目(2018YFB1004100)。
关键词 实体关系抽取 远程监督 ALBERT 预训练语言模型 胶囊网络 entity relationship extraction distant supervision ALBERT pre-training language model capsule network
  • 相关文献

参考文献3

二级参考文献8

共引文献23

同被引文献31

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部