摘要
知识图谱能够实现海量、复杂、异构信息的有机关联。提出一种自动化、持续化的知识图谱构建技术,对知识图谱构建进行系统架构设计,架构中包含数据采集、命名实体识别、语法语义分析、实体关系提取四大模块,其中命名实体识别和实体关系提取是关键技术。其一命名实体识别用于对主要实体对象进行识别,在分词和词性标注过程中实现分词库和词性库的自动扩充;其二在实体关系提取技术中,对非分类关系提取技术进行研究:采用基于关联法则和语义关系的非分类关系提取技术,自动发现概念间的关系,并通过语义标记这种关系;采用基于规则和机器学习的非分类关系提取技术,形成规则库、语料句式库,不断训练、验证,实现持续改进。后续还应继续提升知识图谱在系统运行状态进行自动化、持续化构建的能力,在不依赖于知识库的基础上实现知识图谱体系的自主形成。
Knowledge map can realize the organic association of massive,complex and heterogeneous information.To propose an automatic and continuous knowledge map construction technology.The system architecture design of knowledge map construction is carried out.The architecture includes 4 modules:data collection,named entity recognition,syntax semantic analysis and entity relationship extraction,where,named entity recognition and entity relationship extraction are the key technologies.Firstly,named entity recognition is used to recognize the main entity objects and realize the automatic expansion of word segmentation and part of speech database in the process of word segmentation and part of speech tagging.Secondly,in the entity relation extraction technology,non-classification relation extraction technology is studied:non-classification relation extraction technology based on association rule and semantic relation is used to automatically discover the relationship between concepts and mark the relationship through semantic;non-classification relation extraction technology based on rule and machine learning is used to form rule base and corpus of sentence patterns,which are continuously trained,verified and continuous improvement is realized.In the future,one should continue to improve the ability of automatic and continuous construction of the knowledge map in the system operation state,and realize the independent formation of the knowledge map system without relying on the knowledge base.
作者
韦韬
王金华
WEI Tao;WANG Jin-hua(Beijing CCID Trans Tech Co.,Ltd.,Beijing 100048,China;The 32nd Research Institute of China Electronic Technology Corporation,Shanghai 201808,China)
出处
《工业技术创新》
2020年第2期23-28,共6页
Industrial Technology Innovation
关键词
知识图谱
命名实体识别
实体关系提取
非分类关系提取
语义
Knowledge Map
Named Entity Recognition
Entity Relation Extraction
Non-classification Relation Extraction
Semantics