Knowledge graphs(KGs)have been widely accepted as powerful tools for modeling the complex relationships between concepts and developing knowledge-based services.In recent years,researchers in the field of power system...Knowledge graphs(KGs)have been widely accepted as powerful tools for modeling the complex relationships between concepts and developing knowledge-based services.In recent years,researchers in the field of power systems have explored KGs to develop intelligent dispatching systems for increasingly large power grids.With multiple power grid dispatching knowledge graphs(PDKGs)constructed by different agencies,the knowledge fusion of different PDKGs is useful for providing more accurate decision supports.To achieve this,entity alignment that aims at connecting different KGs by identifying equivalent entities is a critical step.Existing entity alignment methods cannot integrate useful structural,attribute,and relational information while calculating entities’similarities and are prone to making many-to-one alignments,thus can hardly achieve the best performance.To address these issues,this paper proposes a collective entity alignment model that integrates three kinds of available information and makes collective counterpart assignments.This model proposes a novel knowledge graph attention network(KGAT)to learn the embeddings of entities and relations explicitly and calculates entities’similarities by adaptively incorporating the structural,attribute,and relational similarities.Then,we formulate the counterpart assignment task as an integer programming(IP)problem to obtain one-to-one alignments.We not only conduct experiments on a pair of PDKGs but also evaluate o ur model on three commonly used cross-lingual KGs.Experimental comparisons indicate that our model outperforms other methods and provides an effective tool for the knowledge fusion of PDKGs.展开更多
As a core resource of scientific knowledge,academic documents have been frequently used by scholars,especially newcomers to a given field.In the era of big data,scientific documents such as academic articles,patents,t...As a core resource of scientific knowledge,academic documents have been frequently used by scholars,especially newcomers to a given field.In the era of big data,scientific documents such as academic articles,patents,technical reports,and webpages are booming.The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed,improved,and used(Zhang et al.,2021).展开更多
The acquisition of valuable design knowledge from massive fragmentary data is challenging for designers in conceptual product design.This study proposes a novel method for acquiring design knowledge by combining deep ...The acquisition of valuable design knowledge from massive fragmentary data is challenging for designers in conceptual product design.This study proposes a novel method for acquiring design knowledge by combining deep learning with knowledge graph.Specifically,the design knowledge acquisition method utilises the knowledge extraction model to extract design-related entities and relations from fragmentary data,and further constructs the knowledge graph to support design knowledge acquisition for conceptual product design.Moreover,the knowledge extraction model introduces ALBERT to solve memory limitation and communication overhead in the entity extraction module,and uses multi-granularity information to overcome segmentation errors and polysemy ambiguity in the relation extraction module.Experimental comparison verified the effectiveness and accuracy of the proposed knowledge extraction model.The case study demonstrated the feasibility of the knowledge graph construction with real fragmentary porcelain data and showed the capability to provide designers with interconnected and visualised design knowledge.展开更多
Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information ...Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information retrieval,transitioning it from mere string matching to far more sophisticated entity matching.In this transformative process,the advancement of artificial intelligence and intelligent information services is invigorated.Meanwhile,the role ofmachine learningmethod in the construction of KG is important,and these techniques have already achieved initial success.This article embarks on a comprehensive journey through the last strides in the field of KG via machine learning.With a profound amalgamation of cutting-edge research in machine learning,this article undertakes a systematical exploration of KG construction methods in three distinct phases:entity learning,ontology learning,and knowledge reasoning.Especially,a meticulous dissection of machine learningdriven algorithms is conducted,spotlighting their contributions to critical facets such as entity extraction,relation extraction,entity linking,and link prediction.Moreover,this article also provides an analysis of the unresolved challenges and emerging trajectories that beckon within the expansive application of machine learning-fueled,large-scale KG construction.展开更多
Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurr...Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurrent Temporal Graph Convolution Networks(IndRT-GCNets)framework to efficiently and accurately capture event attribute information.The framework models the knowledge graph sequences to learn the evolutionary represen-tations of entities and relations within each period.Firstly,by utilizing the temporal graph convolution module in the evolutionary representation unit,the framework captures the structural dependency relationships within the knowledge graph in each period.Meanwhile,to achieve better event representation and establish effective correlations,an independent recurrent neural network is employed to implement auto-regressive modeling.Furthermore,static attributes of entities in the entity-relation events are constrained andmerged using a static graph constraint to obtain optimal entity representations.Finally,the evolution of entity and relation representations is utilized to predict events in the next subsequent step.On multiple real-world datasets such as Freebase13(FB13),Freebase 15k(FB15K),WordNet11(WN11),WordNet18(WN18),FB15K-237,WN18RR,YAGO3-10,and Nell-995,the results of multiple evaluation indicators show that our proposed IndRT-GCNets framework outperforms most existing models on knowledge reasoning tasks,which validates the effectiveness and robustness.展开更多
Nowadays,the internal structure of spacecraft has been increasingly complex.As its“lifeline”,cables require extensive manpower and resources for manual testing,and it is challenging to quickly and accurately locate ...Nowadays,the internal structure of spacecraft has been increasingly complex.As its“lifeline”,cables require extensive manpower and resources for manual testing,and it is challenging to quickly and accurately locate quality problems and find solutions.To address this problem,a knowledge graph based method is employed to extract multi-source heterogeneous cable knowledge entities.The method utilizes the bidirectional encoder representations from transformers(BERT)network to embed word vectors into the input text,then extracts the contextual features of the input sequence through the bidirectional long short-term memory(BiLSTM)network,and finally inputs them into the conditional random field(CRF)network to predict entity categories.Simultaneously,by using the entities extracted by this model as the data layer,a knowledge graph based method has been constructed.Compared to other traditional extraction methods,the entity extraction method used in this study demonstrates significant improvements in metrics such as precision,recall and an F1 score.Ultimately,employing cable test data from a particular aerospace precision machining company,the study has constructed the knowledge graph based method in the field to achieve visualized queries and the traceability and localization of quality problems.展开更多
Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This pape...Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This paper divides multi-source software knowledge entities into unstructured data,semi-structured data and code data.For these different types of data,Bi-directional Long Short-Term Memory(Bi-LSTM)with Conditional Random Field(CRF),template matching,and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model(MEIM)to extract software entities.The model can be updated continuously based on user’s feedbacks to improve the accuracy.To deal with the shortage of entity annotation datasets,keyword extraction methods based on Term Frequency–Inverse Document Frequency(TF-IDF),TextRank,and K-Means are applied to annotate tasks.The proposed MEIM model is applied to the Spring Boot framework,which demonstrates good adaptability.The extracted entities are used to construct a knowledge graph,which is applied to association retrieval and association visualization.展开更多
It is significant for agricultural intelligent knowledge services using knowledge graph technology to integrate multi-source heterogeneous crop and pest data and fully mine the knowledge hidden in the text.However,onl...It is significant for agricultural intelligent knowledge services using knowledge graph technology to integrate multi-source heterogeneous crop and pest data and fully mine the knowledge hidden in the text.However,only some labeled data for agricultural knowledge graph domain training are available.Furthermore,labeling is costly due to the need for more data openness and standardization.This paper proposes a novel model using knowledge distillation for a weakly supervised entity recognition in ontology construction.Knowledge distillation between the target and source data domain is performed,where Bi-LSTM and CRF models are constructed for entity recognition.The experimental result is shown that we only need to label less than one-tenth of the data for model training.Furthermore,the agricultural domain ontology is constructed by BILSTM-CRF named entity recognition model and relationship extraction model.Moreover,there are a total of 13,983 entities and 26,498 relationships built in the neo4j graph database.展开更多
Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activitie...Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activities of patients.Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE.This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks.In the data preprocessing of both tasks,a GloVe word embedding model was used to vectorize words.In the NER task,a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer.In the MRE task,the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset,the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks,where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task.Moreover,the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction.It also verified the feasibility of the BiLSTM-CRF model in different application scenarios,laying the foundation for the subsequent work in the EMR field.展开更多
The goal of research on the topics such as sentiment analysis and cognition is to analyze the opinions,emotions,evaluations and attitudes that people hold about the entities and their attributes from the text.The word...The goal of research on the topics such as sentiment analysis and cognition is to analyze the opinions,emotions,evaluations and attitudes that people hold about the entities and their attributes from the text.The word level affective cognition becomes an important topic in sentiment analysis.Extracting the(attribute,opinion word)binary relationship by word segmentation and dependency parsing,and labeling those by existing emotional dictionary combined with webpage information and manual annotation,this paper constitutes a binary relationship knowledge base.By using knowledge embedding method,embedding each element in(attribute,opinion,opinion word)as a word vector into the Knowledge Graph by TransG,and defining an algorithm to distinguish the opinion between the attribute word vector and the opinion word vector.Compared with traditional method,this engine has the advantages of high processing speed and low occupancy,which makes up the time-costing and high calculating complexity in the former methods.展开更多
Knowledge graph(KG)fact prediction aims to complete a KG by determining the truthfulness of predicted triples.Reinforcement learning(RL)-based approaches have been widely used for fact prediction.However,the existing ...Knowledge graph(KG)fact prediction aims to complete a KG by determining the truthfulness of predicted triples.Reinforcement learning(RL)-based approaches have been widely used for fact prediction.However,the existing approaches largely suffer from unreliable calculations on rule confidences owing to a limited number of obtained reasoning paths,thereby resulting in unreliable decisions on prediction triples.Hence,we propose a new RL-based approach named EvoPath in this study.EvoPath features a new reward mechanism based on entity heterogeneity,facilitating an agent to obtain effective reasoning paths during random walks.EvoPath also incorporates a new postwalking mechanism to leverage easily overlooked but valuable reasoning paths during RL.Both mechanisms provide sufficient reasoning paths to facilitate the reliable calculations of rule confidences,enabling EvoPath to make precise judgments about the truthfulness of prediction triples.Experiments demonstrate that EvoPath can achieve more accurate fact predictions than existing approaches.展开更多
Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural...Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.展开更多
Equipment defect detection is essential to the security and stabil-ity of power grid networking operations.Besides the status of the power grid itself,environmental information is also necessary for equipment defect d...Equipment defect detection is essential to the security and stabil-ity of power grid networking operations.Besides the status of the power grid itself,environmental information is also necessary for equipment defect detection.At the same time,different types of intelligent sensors can mon-itor environmental information,such as temperature,humidity,dust,etc.Therefore,we apply the Internet of Things(IoT)technology to monitor the related environment and pervasive interconnections to diverse physical objects.However,the data related to device defects in the existing Internet of Things are complex and lack uniform association hence building a knowledge graph is proposed to solve the problems.Intelligent equipment defect domain ontology is the semantic basis for constructing a defect knowledge graph,which can be used to organize,share,and analyze equipment defect-related knowledge.At present,there are a lot of relevant data in the field of intelligent equipment defects.These equipment defect data often focus on a single aspect of the defect field.It is difficult to integrate the database with various types of equipment defect information.This paper combines the characteristics of existing data sources to build a general intelligent equipment defect domain ontology.Based on ontology,this paper proposed the BERT-BiLSTM-Att-CRF model to recognize the entities.This method solves the problem of diverse entity names and insufficient feature information extraction in the field of equipment defect field.The final experiment proves that this model is superior to other models in precision,recall,and F1 value.This research can break the barrier of multi-source heterogeneous knowledge,build an efficient storage engine for multimodal data,and empower the safety of Industrial applications,data,and platforms in multi-clouds for Internet of Things.展开更多
Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurat...Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.展开更多
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c...Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.展开更多
With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of train...With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.展开更多
Entity alignment(EA)is an important technique aiming to find the same real entity between two different source knowledge graphs(KGs).Current methods typically learn the embedding of entities for EA from the structure ...Entity alignment(EA)is an important technique aiming to find the same real entity between two different source knowledge graphs(KGs).Current methods typically learn the embedding of entities for EA from the structure of KGs for EA.Most EA models are designed for rich-resource languages,requiring sufficient resources such as a parallel corpus and pre-trained language models.However,low-resource language KGs have received less attention,and current models demonstrate poor performance on those low-resource KGs.Recently,researchers have fused relation information and attributes for entity representations to enhance the entity alignment performance,but the relation semantics are often ignored.To address these issues,we propose a novel Semantic-aware Graph Neural Network(SGNN)for entity alignment.First,we generate pseudo sentences according to the relation triples and produce representations using pre-trained models.Second,our approach explores semantic information from the connected relations by a graph neural network.Our model captures expanded feature information from KGs.Experimental results using three low-resource languages demonstrate that our proposed SGNN approach out performs better than state-of-the-art alignment methods on three proposed datasets and three public datasets.展开更多
With the application of artificial intelligence technology in the power industry,the knowledge graph is expected to play a key role in power grid dispatch processes,intelligent maintenance,and customer service respons...With the application of artificial intelligence technology in the power industry,the knowledge graph is expected to play a key role in power grid dispatch processes,intelligent maintenance,and customer service response provision.Knowledge graphs are usually constructed based on entity recognition.Specifically,based on the mining of entity attributes and relationships,domain knowledge graphs can be constructed through knowledge fusion.In this work,the entities and characteristics of power entity recognition are analyzed,the mechanism of entity recognition is clarified,and entity recognition techniques are analyzed in the context of the power domain.Power entity recognition based on the conditional random fields (CRF) and bidirectional long short-term memory (BLSTM) models is investigated,and the two methods are comparatively analyzed.The results indicated that the CRF model,with an accuracy of 83%,can better identify the power entities compared to the BLSTM.The CRF approach can thus be applied to the entity extraction for knowledge graph construction in the power field.展开更多
A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have...A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have used deep learning methods for entity matching,and these methods have achieved good results.However,the common limitation of these methods is that they assume that different attribute columns of the same entity are independent,and inputting the model in the form of paired entity records will cause repeated calculations.In fact,there are often potential relations between different attribute columns of different entities.These relations can help us improve the effect of entity matching,and can perform feature extraction on a single entity record to avoid repeated calculations.To use attribute relations to assist entity matching,this paper proposes the Relation-aware Entity Matching method,which embeds attribute relations into the original entity description to form sentences,so that entity matching is transformed into a sentence-level similarity determination task,based on Sentence-BERT completes sentence similarity calculation.We have conducted experiments on structured,dirty,and textual data,and compared them with baselines in recent years.Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data.Our method has good results on most data sets for entity matching and reduces repeated calculations.展开更多
基金supported by the National Key R&D Program of China(2018AAA0101502)the Science and Technology Project of SGCC(State Grid Corporation of China):Fundamental Theory of Human-in-the-Loop Hybrid-Augmented Intelligence for Power Grid Dispatch and Control。
文摘Knowledge graphs(KGs)have been widely accepted as powerful tools for modeling the complex relationships between concepts and developing knowledge-based services.In recent years,researchers in the field of power systems have explored KGs to develop intelligent dispatching systems for increasingly large power grids.With multiple power grid dispatching knowledge graphs(PDKGs)constructed by different agencies,the knowledge fusion of different PDKGs is useful for providing more accurate decision supports.To achieve this,entity alignment that aims at connecting different KGs by identifying equivalent entities is a critical step.Existing entity alignment methods cannot integrate useful structural,attribute,and relational information while calculating entities’similarities and are prone to making many-to-one alignments,thus can hardly achieve the best performance.To address these issues,this paper proposes a collective entity alignment model that integrates three kinds of available information and makes collective counterpart assignments.This model proposes a novel knowledge graph attention network(KGAT)to learn the embeddings of entities and relations explicitly and calculates entities’similarities by adaptively incorporating the structural,attribute,and relational similarities.Then,we formulate the counterpart assignment task as an integer programming(IP)problem to obtain one-to-one alignments.We not only conduct experiments on a pair of PDKGs but also evaluate o ur model on three commonly used cross-lingual KGs.Experimental comparisons indicate that our model outperforms other methods and provides an effective tool for the knowledge fusion of PDKGs.
文摘As a core resource of scientific knowledge,academic documents have been frequently used by scholars,especially newcomers to a given field.In the era of big data,scientific documents such as academic articles,patents,technical reports,and webpages are booming.The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed,improved,and used(Zhang et al.,2021).
基金This research is supported by the Chinese Special Projects of the National Key Research and Development Plan(2019YFB1405702).
文摘The acquisition of valuable design knowledge from massive fragmentary data is challenging for designers in conceptual product design.This study proposes a novel method for acquiring design knowledge by combining deep learning with knowledge graph.Specifically,the design knowledge acquisition method utilises the knowledge extraction model to extract design-related entities and relations from fragmentary data,and further constructs the knowledge graph to support design knowledge acquisition for conceptual product design.Moreover,the knowledge extraction model introduces ALBERT to solve memory limitation and communication overhead in the entity extraction module,and uses multi-granularity information to overcome segmentation errors and polysemy ambiguity in the relation extraction module.Experimental comparison verified the effectiveness and accuracy of the proposed knowledge extraction model.The case study demonstrated the feasibility of the knowledge graph construction with real fragmentary porcelain data and showed the capability to provide designers with interconnected and visualised design knowledge.
基金supported in part by the Beijing Natural Science Foundation under Grants L211020 and M21032in part by the National Natural Science Foundation of China under Grants U1836106 and 62271045in part by the Scientific and Technological Innovation Foundation of Foshan under Grants BK21BF001 and BK20BF010。
文摘Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information retrieval,transitioning it from mere string matching to far more sophisticated entity matching.In this transformative process,the advancement of artificial intelligence and intelligent information services is invigorated.Meanwhile,the role ofmachine learningmethod in the construction of KG is important,and these techniques have already achieved initial success.This article embarks on a comprehensive journey through the last strides in the field of KG via machine learning.With a profound amalgamation of cutting-edge research in machine learning,this article undertakes a systematical exploration of KG construction methods in three distinct phases:entity learning,ontology learning,and knowledge reasoning.Especially,a meticulous dissection of machine learningdriven algorithms is conducted,spotlighting their contributions to critical facets such as entity extraction,relation extraction,entity linking,and link prediction.Moreover,this article also provides an analysis of the unresolved challenges and emerging trajectories that beckon within the expansive application of machine learning-fueled,large-scale KG construction.
基金the National Natural Science Founda-tion of China(62062062)hosted by Gulila Altenbek.
文摘Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurrent Temporal Graph Convolution Networks(IndRT-GCNets)framework to efficiently and accurately capture event attribute information.The framework models the knowledge graph sequences to learn the evolutionary represen-tations of entities and relations within each period.Firstly,by utilizing the temporal graph convolution module in the evolutionary representation unit,the framework captures the structural dependency relationships within the knowledge graph in each period.Meanwhile,to achieve better event representation and establish effective correlations,an independent recurrent neural network is employed to implement auto-regressive modeling.Furthermore,static attributes of entities in the entity-relation events are constrained andmerged using a static graph constraint to obtain optimal entity representations.Finally,the evolution of entity and relation representations is utilized to predict events in the next subsequent step.On multiple real-world datasets such as Freebase13(FB13),Freebase 15k(FB15K),WordNet11(WN11),WordNet18(WN18),FB15K-237,WN18RR,YAGO3-10,and Nell-995,the results of multiple evaluation indicators show that our proposed IndRT-GCNets framework outperforms most existing models on knowledge reasoning tasks,which validates the effectiveness and robustness.
文摘Nowadays,the internal structure of spacecraft has been increasingly complex.As its“lifeline”,cables require extensive manpower and resources for manual testing,and it is challenging to quickly and accurately locate quality problems and find solutions.To address this problem,a knowledge graph based method is employed to extract multi-source heterogeneous cable knowledge entities.The method utilizes the bidirectional encoder representations from transformers(BERT)network to embed word vectors into the input text,then extracts the contextual features of the input sequence through the bidirectional long short-term memory(BiLSTM)network,and finally inputs them into the conditional random field(CRF)network to predict entity categories.Simultaneously,by using the entities extracted by this model as the data layer,a knowledge graph based method has been constructed.Compared to other traditional extraction methods,the entity extraction method used in this study demonstrates significant improvements in metrics such as precision,recall and an F1 score.Ultimately,employing cable test data from a particular aerospace precision machining company,the study has constructed the knowledge graph based method in the field to achieve visualized queries and the traceability and localization of quality problems.
基金Zhifang Liao:Ministry of Science and Technology:Key Research and Development Project(2018YFB003800),Hunan Provincial Key Laboratory of Finance&Economics Big Data Scienceand Technology(Hunan University of Finance and Economics)2017TP1025,HNNSF 2018JJ2535Shengzong Liu:NSF61802120.
文摘Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This paper divides multi-source software knowledge entities into unstructured data,semi-structured data and code data.For these different types of data,Bi-directional Long Short-Term Memory(Bi-LSTM)with Conditional Random Field(CRF),template matching,and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model(MEIM)to extract software entities.The model can be updated continuously based on user’s feedbacks to improve the accuracy.To deal with the shortage of entity annotation datasets,keyword extraction methods based on Term Frequency–Inverse Document Frequency(TF-IDF),TextRank,and K-Means are applied to annotate tasks.The proposed MEIM model is applied to the Spring Boot framework,which demonstrates good adaptability.The extracted entities are used to construct a knowledge graph,which is applied to association retrieval and association visualization.
基金supported by Heilongjiang NSF funding,No.LH202F022Heilongjiang research and application of key technologies,No.2021ZXJ05A03New generation artificial intelligent program,No.21ZD0110900 in CHINA.
文摘It is significant for agricultural intelligent knowledge services using knowledge graph technology to integrate multi-source heterogeneous crop and pest data and fully mine the knowledge hidden in the text.However,only some labeled data for agricultural knowledge graph domain training are available.Furthermore,labeling is costly due to the need for more data openness and standardization.This paper proposes a novel model using knowledge distillation for a weakly supervised entity recognition in ontology construction.Knowledge distillation between the target and source data domain is performed,where Bi-LSTM and CRF models are constructed for entity recognition.The experimental result is shown that we only need to label less than one-tenth of the data for model training.Furthermore,the agricultural domain ontology is constructed by BILSTM-CRF named entity recognition model and relationship extraction model.Moreover,there are a total of 13,983 entities and 26,498 relationships built in the neo4j graph database.
基金Supported by the Zhejiang Provincial Natural Science Foundation(No.LQ16H180004)~~
文摘Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activities of patients.Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE.This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks.In the data preprocessing of both tasks,a GloVe word embedding model was used to vectorize words.In the NER task,a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer.In the MRE task,the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset,the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks,where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task.Moreover,the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction.It also verified the feasibility of the BiLSTM-CRF model in different application scenarios,laying the foundation for the subsequent work in the EMR field.
基金This research is supported by the Key Program of National Natural Science Foundation of China(Grant Nos.U1536201 and U1405254)the National Natural Science Foundation of China(Grant No.61472092).
文摘The goal of research on the topics such as sentiment analysis and cognition is to analyze the opinions,emotions,evaluations and attitudes that people hold about the entities and their attributes from the text.The word level affective cognition becomes an important topic in sentiment analysis.Extracting the(attribute,opinion word)binary relationship by word segmentation and dependency parsing,and labeling those by existing emotional dictionary combined with webpage information and manual annotation,this paper constitutes a binary relationship knowledge base.By using knowledge embedding method,embedding each element in(attribute,opinion,opinion word)as a word vector into the Knowledge Graph by TransG,and defining an algorithm to distinguish the opinion between the attribute word vector and the opinion word vector.Compared with traditional method,this engine has the advantages of high processing speed and low occupancy,which makes up the time-costing and high calculating complexity in the former methods.
基金the National Natural Science Foundation of China,Nos.62272480 and 62072470and the National Science Foundation of Hunan Province,Nos.2021JJ30881 and 2020JJ4758.
文摘Knowledge graph(KG)fact prediction aims to complete a KG by determining the truthfulness of predicted triples.Reinforcement learning(RL)-based approaches have been widely used for fact prediction.However,the existing approaches largely suffer from unreliable calculations on rule confidences owing to a limited number of obtained reasoning paths,thereby resulting in unreliable decisions on prediction triples.Hence,we propose a new RL-based approach named EvoPath in this study.EvoPath features a new reward mechanism based on entity heterogeneity,facilitating an agent to obtain effective reasoning paths during random walks.EvoPath also incorporates a new postwalking mechanism to leverage easily overlooked but valuable reasoning paths during RL.Both mechanisms provide sufficient reasoning paths to facilitate the reliable calculations of rule confidences,enabling EvoPath to make precise judgments about the truthfulness of prediction triples.Experiments demonstrate that EvoPath can achieve more accurate fact predictions than existing approaches.
基金The National Natural Science Foundation of China(No.61502095).
文摘Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.
基金supported by the fund project:Research on Basic Capability ofMultimodal Cognitive Graph(Granted No.524608210192).
文摘Equipment defect detection is essential to the security and stabil-ity of power grid networking operations.Besides the status of the power grid itself,environmental information is also necessary for equipment defect detection.At the same time,different types of intelligent sensors can mon-itor environmental information,such as temperature,humidity,dust,etc.Therefore,we apply the Internet of Things(IoT)technology to monitor the related environment and pervasive interconnections to diverse physical objects.However,the data related to device defects in the existing Internet of Things are complex and lack uniform association hence building a knowledge graph is proposed to solve the problems.Intelligent equipment defect domain ontology is the semantic basis for constructing a defect knowledge graph,which can be used to organize,share,and analyze equipment defect-related knowledge.At present,there are a lot of relevant data in the field of intelligent equipment defects.These equipment defect data often focus on a single aspect of the defect field.It is difficult to integrate the database with various types of equipment defect information.This paper combines the characteristics of existing data sources to build a general intelligent equipment defect domain ontology.Based on ontology,this paper proposed the BERT-BiLSTM-Att-CRF model to recognize the entities.This method solves the problem of diverse entity names and insufficient feature information extraction in the field of equipment defect field.The final experiment proves that this model is superior to other models in precision,recall,and F1 value.This research can break the barrier of multi-source heterogeneous knowledge,build an efficient storage engine for multimodal data,and empower the safety of Industrial applications,data,and platforms in multi-clouds for Internet of Things.
基金supported by the National Key R&D Program of China(2019YFB2103202).
文摘Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.
基金supported by the Outstanding Youth Team Project of Central Universities(QNTD202308)the Ant Group through CCF-Ant Research Fund(CCF-AFSG 769498 RF20220214).
文摘Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.
基金the Beijing Municipal Science and Technology Program(Z231100001323004)。
文摘With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.
基金National Natural Science Foundation of China(Nos.U21B2027,61972186,61732005)Major Science and Technology Projects of Yunnan Province(Nos.202202AD080003,202203AA080004).
文摘Entity alignment(EA)is an important technique aiming to find the same real entity between two different source knowledge graphs(KGs).Current methods typically learn the embedding of entities for EA from the structure of KGs for EA.Most EA models are designed for rich-resource languages,requiring sufficient resources such as a parallel corpus and pre-trained language models.However,low-resource language KGs have received less attention,and current models demonstrate poor performance on those low-resource KGs.Recently,researchers have fused relation information and attributes for entity representations to enhance the entity alignment performance,but the relation semantics are often ignored.To address these issues,we propose a novel Semantic-aware Graph Neural Network(SGNN)for entity alignment.First,we generate pseudo sentences according to the relation triples and produce representations using pre-trained models.Second,our approach explores semantic information from the connected relations by a graph neural network.Our model captures expanded feature information from KGs.Experimental results using three low-resource languages demonstrate that our proposed SGNN approach out performs better than state-of-the-art alignment methods on three proposed datasets and three public datasets.
基金supported by Science and Technology Project of State Grid Corporation(Research and Application of Intelligent Energy Meter Quality Analysis and Evaluation Technology Based on Full Chain Data)
文摘With the application of artificial intelligence technology in the power industry,the knowledge graph is expected to play a key role in power grid dispatch processes,intelligent maintenance,and customer service response provision.Knowledge graphs are usually constructed based on entity recognition.Specifically,based on the mining of entity attributes and relationships,domain knowledge graphs can be constructed through knowledge fusion.In this work,the entities and characteristics of power entity recognition are analyzed,the mechanism of entity recognition is clarified,and entity recognition techniques are analyzed in the context of the power domain.Power entity recognition based on the conditional random fields (CRF) and bidirectional long short-term memory (BLSTM) models is investigated,and the two methods are comparatively analyzed.The results indicated that the CRF model,with an accuracy of 83%,can better identify the power entities compared to the BLSTM.The CRF approach can thus be applied to the entity extraction for knowledge graph construction in the power field.
基金This work is funded by Guangdong Basic and Applied Basic Research Foundation(No.2021A1515012307,2020A1515010450)Guangzhou Basic and Applied Basic Research Foundation(No.202102021207,202102020867)+4 种基金the National Natural Science Foundation of China(No.62072130,61702220,61702223)Guangdong Province Key Area R&D Program of China(No.2019B010136003,2019B010137004)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2019)Guangdong Higher Education Innovation Group(No.2020KCXTD007)Guangzhou Higher Education Innovation Group(No.202032854)。
文摘A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have used deep learning methods for entity matching,and these methods have achieved good results.However,the common limitation of these methods is that they assume that different attribute columns of the same entity are independent,and inputting the model in the form of paired entity records will cause repeated calculations.In fact,there are often potential relations between different attribute columns of different entities.These relations can help us improve the effect of entity matching,and can perform feature extraction on a single entity record to avoid repeated calculations.To use attribute relations to assist entity matching,this paper proposes the Relation-aware Entity Matching method,which embeds attribute relations into the original entity description to form sentences,so that entity matching is transformed into a sentence-level similarity determination task,based on Sentence-BERT completes sentence similarity calculation.We have conducted experiments on structured,dirty,and textual data,and compared them with baselines in recent years.Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data.Our method has good results on most data sets for entity matching and reduces repeated calculations.