期刊文献+

基于字词混合和GRU的科技文本知识抽取方法 被引量:3

Knowledge Extraction Method of Scientific and Technological Text Based on Word Mixing and GRU
下载PDF
导出
摘要 知识抽取任务是从非结构化的文本数据抽取三元组关系(头实体-关系-尾实体)。现有知识抽取方法分为流水式方法和联合抽取方法。流水式方法将命名实体识别和实体知识抽取分别用各自的模块抽取,这种方式虽然有较好的灵活性,但训练速度较慢。联合抽取的学习模型是一种通过神经网络实现的端到端的模型,同时实现实体识别和知识抽取,能够很好地保留实体和关系之间的关联,将实体和关系的联合抽取转化为一个序列标注问题。基于此,本文提出了一种基于字词混合和门控制单元(Gated Recurrent Unit, GRU)的科技文本知识抽取(MBGAB)方法,结合注意力机制提取中文科技资源文本的关系;采用字词混合的向量映射方式,既在最大程度上避免边界切分出错,又有效融入语义信息;采用端到端的联合抽取模型,利用双向GRU网络,结合自注意力机制来有效捕获句子中的长距离语义信息,并且通过引入偏置权重来提高模型抽取效果。 The knowledge extraction task is to extract triple relations(head entity-relation-tail entity) from the unstructured text data.The existing knowledge extraction methods are divided into "pipeline" method and joint extraction method.The "pipeline" method extracts named entity recognition and entity knowledge extraction with their respective modules.Although this method has better flexibility, the training speed is slow.The learning model of joint extraction is an end-to-end model implemented by neural network to realize entity recognition and relationship extraction at the same time, which can well preserve the association between entities and relationships, and convert the joint extraction of entities and relationships into a sequence labeling problem.The main contributions of this paper are as follows:(1) A knowledge extraction method for scientific and technological text based on word mixing and Gated Recurrent Unit(MBGAB) is proposed, which combines attention mechanism to extract the relationship between Chinese scientific and technological resource text.(2) Vector mapping method using mixed words can not only avoid boundary segmentation errors to the greatest extent, but also effectively integrate semantic information.(3) The end-to-end joint extraction model, the bidirectional GRU network and the self-attention mechanism are used to effectively capture the long-distance semantic information in the sentence, and the bias weight is introduced to improve the effect of model extraction.
作者 欧阳苏宇 邵蓥侠 杜军平 李昂 OUYANG Suyu;SHAO Yingxia;DU Junping;LI Ang(Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia,College of Computer Science,Beijing University of Posts and Telecommunicates,Beijing,100082,China)
出处 《广西科学》 CAS 北大核心 2022年第4期634-641,共8页 Guangxi Sciences
基金 国家重点研发计划项目(2018YFB1402600) 国家自然科学基金项目(61772083,61877006,61802028,62002027)资助。
关键词 知识抽取 向量映射 GRU 三元组关系 联合抽取方法 knowledge extraction vector map GRU triple relation joint extraction method
  • 相关文献

参考文献5

二级参考文献95

  • 1高翔.发挥知识产权支撑保障作用助力科技创新驱动高质量发展[J].经营与管理,2021(4):82-85. 被引量:4
  • 2李程雄,丁月华,文贵华.SVM-KNN组合改进算法在专利文本分类中的应用[J].计算机工程与应用,2006,42(20):193-195. 被引量:23
  • 3Miller G A. WordNet: A lexical database for English [J]. Communications of the ACM, 1995, 38(11): 39-41.
  • 4Bollacker K, Evans C, Paritosh P, et al. Freebase: A collaboratively created graph database for structuring human knowledge [C] //Proe of KDD. New York: ACM, 2008: 1247-1250.
  • 5Miller E. An introduction to the resource description framework [J]. Bulletin of the American Society for Information Science and Technology, 1998, 25(1): 15-19.
  • 6Bengio Y. Learning deep architectures for AI [J]. Foundations and Trends in Machine Learning, 2099, 2 (1) 1-127.
  • 7Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828.
  • 8Turian J, Ratinov L, Bengio Y. Word representations: A simple and general method for semi-supervised learning [C]// Proc of ACL. Stroudsburg, PA: ACL, 2010:384-394.
  • 9Manning C D, Raghavan P, Schutze H. Introduction to Information Retrieval [M]. Cambridge, UK: Cambridge University Press, 2008.
  • 10Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their eompositionality [C] //Proe of NIPS. Cambridge, MA: MIT Press, 2013:3111-3119.

共引文献298

同被引文献26

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部