摘要
专利知识图谱对专利精准检索、专利深度分析和专利知识培训等应用起到了重要作用。文中提出了一种实用的基于种子知识图谱、文本挖掘以及关系补全的专利知识图谱构建方法。在该方法中,为确保质量,首先人工建立一个种子专利知识图谱,然后采用专利文本模式的概念和关系抽取方法扩展种子专利知识图谱,最后对扩展的专利知识图谱进行定量评估。文中针对中医药领域专利进行了种子知识的人工提取和词法句法模式的人工总结,并使用机器学习的方法在学习到新的词法句法模式后对种子专利知识图谱进行扩展和图谱补全。实验结果表明,中医药领域专利种子知识图谱中的节点数和关系数分别为19453个和194775条,经过扩展后,它们分别达到了558461个和7275958条,即分别增加了27.7倍和36.3倍。
Patent knowledge graph plays a important role in patent accurate retrieval,patent in-depth analysis and patent know-ledge training.This paper proposes a practical patent knowledge graph construction method based on seed knowledge graph,text mining and relationship completion.In this method,to ensure the quality,a seed patent knowledge graph is first established ma-nually,then the concept and relation extraction method of patent text pattern is used to expand the seed patent knowledge graph,and finally the extended patent knowledge graph is quantitatively evaluated.In this paper,artificial extraction of seed knowledge and manual summarization of lexical and syntactic patterns are carried out for patents in the field of traditional Chinese medicine.After obtaining new lexical and syntactic patterns by machine learning,the knowledge graph of seed patent is expanded and completed.Experimental results show that the number of nodes and relationships in the knowledge graph of traditional Chinese medicine are 19453 and 194775 respectively.After expansion,they reach 558461 and 7275958 respectively,representing an increase of 27.7 and 36.3 folds respectively.
作者
邓亮
曹存根
DENG Liang;CAO Cun-gen(School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China;Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China;Patent Office,China National Intellectual Property Administration,Beijing 100083,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
出处
《计算机科学》
CSCD
北大核心
2022年第11期185-196,共12页
Computer Science
关键词
专利文本
专利知识图谱
词法句法分析
表示学习
Patent text
Patent knowledge graph
Lexical and syntactic analysis
Representation learning