Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event eleme...Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data.Although researchers have proposed various methods to accomplish this task,most existing event extraction models cannot address these challenges because they are only applicable to text scenarios.To solve the above issues,this paper proposes a multi-modal event extraction method based on knowledge fusion.Specifically,for event-type recognition,we use a meticulous pipeline approach that integrates multiple pre-trained models.This approach enables a more comprehensive capture of the multidimensional event semantic features present in military texts,thereby enhancing the interconnectedness of information between trigger words and events.For event element extraction,we propose a method for constructing a priori templates that combine event types with corresponding trigger words.This approach facilitates the acquisition of fine-grained input samples containing event trigger words,thus enabling the model to understand the semantic relationships between elements in greater depth.Furthermore,a fusion method for spatial mapping of textual event elements and image elements is proposed to reduce the category number overload and effectively achieve multi-modal knowledge fusion.The experimental results based on the CCKS 2022 dataset show that our method has achieved competitive results,with a comprehensive evaluation value F1-score of 53.4%for the model.These results validate the effectiveness of our method in extracting event elements from multi-modal data.展开更多
Emotion cause extraction(ECE)task that aims at extracting potential trigger events of certain emotions has attracted extensive attention recently.However,current work neglects the implicit emotion expressed without an...Emotion cause extraction(ECE)task that aims at extracting potential trigger events of certain emotions has attracted extensive attention recently.However,current work neglects the implicit emotion expressed without any explicit emotional keywords,which appears more frequently in application scenarios.The lack of explicit emotion information makes it extremely hard to extract emotion causes only with the local context.Moreover,an entire event is usually across multiple clauses,while existing work merely extracts cause events at clause level and cannot effectively capture complete cause event information.To address these issues,the events are first redefined at the tuple level and a span-based tuple-level algorithm is proposed to extract events from different clauses.Based on it,a corpus for implicit emotion cause extraction that tries to extract causes of implicit emotions is constructed.The authors propose a knowledge-enriched jointlearning model of implicit emotion recognition and implicit emotion cause extraction tasks(KJ-IECE),which leverages commonsense knowledge from ConceptNet and NRC_VAD to better capture connections between emotion and corresponding cause events.Experiments on both implicit and explicit emotion cause extraction datasets demonstrate the effectiveness of the proposed model.展开更多
To address the difficulty of training high-quality models in some specific domains due to the lack of fine-grained annotation resources, we propose in this paper a knowledge-integrated cross-domain data generation met...To address the difficulty of training high-quality models in some specific domains due to the lack of fine-grained annotation resources, we propose in this paper a knowledge-integrated cross-domain data generation method for unsupervised domain adaptation tasks. Specifically, we extract domain features, lexical and syntactic knowledge from source-domain and target-domain data, and use a masking model with an extended masking strategy and a re-masking strategy to obtain domain-specific data that remove domain-specific features. Finally, we improve the sequence generation model BART and use it to generate high-quality target domain data for the task of aspect and opinion co-extraction from the target domain. Experiments were performed on three conventional English datasets from different domains, and our method generates more accurate and diverse target domain data with the best results compared to previous methods.展开更多
The acquisition of valuable design knowledge from massive fragmentary data is challenging for designers in conceptual product design.This study proposes a novel method for acquiring design knowledge by combining deep ...The acquisition of valuable design knowledge from massive fragmentary data is challenging for designers in conceptual product design.This study proposes a novel method for acquiring design knowledge by combining deep learning with knowledge graph.Specifically,the design knowledge acquisition method utilises the knowledge extraction model to extract design-related entities and relations from fragmentary data,and further constructs the knowledge graph to support design knowledge acquisition for conceptual product design.Moreover,the knowledge extraction model introduces ALBERT to solve memory limitation and communication overhead in the entity extraction module,and uses multi-granularity information to overcome segmentation errors and polysemy ambiguity in the relation extraction module.Experimental comparison verified the effectiveness and accuracy of the proposed knowledge extraction model.The case study demonstrated the feasibility of the knowledge graph construction with real fragmentary porcelain data and showed the capability to provide designers with interconnected and visualised design knowledge.展开更多
Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information ...Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information retrieval,transitioning it from mere string matching to far more sophisticated entity matching.In this transformative process,the advancement of artificial intelligence and intelligent information services is invigorated.Meanwhile,the role ofmachine learningmethod in the construction of KG is important,and these techniques have already achieved initial success.This article embarks on a comprehensive journey through the last strides in the field of KG via machine learning.With a profound amalgamation of cutting-edge research in machine learning,this article undertakes a systematical exploration of KG construction methods in three distinct phases:entity learning,ontology learning,and knowledge reasoning.Especially,a meticulous dissection of machine learningdriven algorithms is conducted,spotlighting their contributions to critical facets such as entity extraction,relation extraction,entity linking,and link prediction.Moreover,this article also provides an analysis of the unresolved challenges and emerging trajectories that beckon within the expansive application of machine learning-fueled,large-scale KG construction.展开更多
Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurat...Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.展开更多
Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activitie...Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activities of patients.Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE.This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks.In the data preprocessing of both tasks,a GloVe word embedding model was used to vectorize words.In the NER task,a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer.In the MRE task,the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset,the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks,where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task.Moreover,the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction.It also verified the feasibility of the BiLSTM-CRF model in different application scenarios,laying the foundation for the subsequent work in the EMR field.展开更多
Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This pape...Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This paper divides multi-source software knowledge entities into unstructured data,semi-structured data and code data.For these different types of data,Bi-directional Long Short-Term Memory(Bi-LSTM)with Conditional Random Field(CRF),template matching,and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model(MEIM)to extract software entities.The model can be updated continuously based on user’s feedbacks to improve the accuracy.To deal with the shortage of entity annotation datasets,keyword extraction methods based on Term Frequency–Inverse Document Frequency(TF-IDF),TextRank,and K-Means are applied to annotate tasks.The proposed MEIM model is applied to the Spring Boot framework,which demonstrates good adaptability.The extracted entities are used to construct a knowledge graph,which is applied to association retrieval and association visualization.展开更多
Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not ref...Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.展开更多
With the escalating complexity in production scenarios, vast amounts of production information are retained within enterprises in the industrial domain. Probing questions of how to meticulously excavate value from com...With the escalating complexity in production scenarios, vast amounts of production information are retained within enterprises in the industrial domain. Probing questions of how to meticulously excavate value from complex document information and establish coherent information links arise. In this work, we present a framework for knowledge graph construction in the industrial domain, predicated on knowledge-enhanced document-level entity and relation extraction. This approach alleviates the shortage of annotated data in the industrial domain and models the interplay of industrial documents. To augment the accuracy of named entity recognition, domain-specific knowledge is incorporated into the initialization of the word embedding matrix within the bidirectional long short-term memory conditional random field (BiLSTM-CRF) framework. For relation extraction, this paper introduces the knowledge-enhanced graph inference (KEGI) network, a pioneering method designed for long paragraphs in the industrial domain. This method discerns intricate interactions among entities by constructing a document graph and innovatively integrates knowledge representation into both node construction and path inference through TransR. On the application stratum, BiLSTM-CRF and KEGI are utilized to craft a knowledge graph from a knowledge representation model and Chinese fault reports for a steel production line, specifically SPOnto and SPFRDoc. The F1 value for entity and relation extraction has been enhanced by 2% to 6%. The quality of the extracted knowledge graph complies with the requirements of real-world production environment applications. The results demonstrate that KEGI can profoundly delve into production reports, extracting a wealth of knowledge and patterns, thereby providing a comprehensive solution for production management.展开更多
Psychological counseling Q&A system is enjoying a remarkable and increasing popularity in recent years. Knowledge base is the important component for such kind of systems, but it is difficult and time-consuming to...Psychological counseling Q&A system is enjoying a remarkable and increasing popularity in recent years. Knowledge base is the important component for such kind of systems, but it is difficult and time-consuming to construct the knowledge base manually. Fortunately, there emerges large number of Q&A pairs in many psychological counseling websites, which can provide good source enriching the knowledge base. This paper presents the method of knowledge extraction from psychological consulting Q&A pairs of on-line psychological counseling websites, which include keywords, semantic extension and word sequence. P-XML, which is the knowledge template based on XML, is also designed to store the knowledge. The extracted knowledge has been successfully used in our non-obstructive psychologycal counseling system, called P.A.L., and the experimental results also demonstrated the feasibility and effectiveness of our approach.展开更多
Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foun...Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foundation for later knowledge reasoning and its application.Methods Under the guidance of experts in the classical formula of traditional Chinese medicine(TCM),the method of“top-down as the main,bottom-up as the auxiliary”was adopted to carry out knowledge extraction,knowledge fusion,and knowledge storage from the five aspects of the disease,syndrome,symptom,method,and formula for the original text of Treatise on Febrile Diseases,and so the knowledge graph of Treatise on Febrile Diseases was constructed.On this basis,the knowledge structure query and the knowledge relevance query were realized in a visual manner.Results The knowledge graph of“disease-syndrome-symptom-method-formula”in the Treatise on Febrile Diseases was constructed,containing 6469 entities and 10911 relational triples,on which the query of entities and their relationships can be carried out and the query result can be visualized.Conclusion The knowledge graph of Treatise on Febrile Diseases systematically realizes its digitization of the knowledge system,and improves the completeness and accuracy of the knowledge representation,and the connection between“disease-syndrome-symptom-treatment-formula”,which is conducive to the sharing and reuse of knowledge can be obtained in a clear and efficient way.展开更多
This paper proposes a method to construct conceptual semantic knowledge base of software engineering domain based on Wikipedia. First, it takes the concept of SWEBOK V3 as the standard to extract the interpretation of...This paper proposes a method to construct conceptual semantic knowledge base of software engineering domain based on Wikipedia. First, it takes the concept of SWEBOK V3 as the standard to extract the interpretation of the concept from the Wikipedia, and extracts the keywords as the concept of semantic;Second, through the conceptual semantic knowledge base, it is formed by the relationship between the hierarchical relationship concept and the other text interpretation concept in the Wikipedia. Finally, the semantic similarity between concepts is calculated by the random walk algorithm for the construction of the conceptual semantic knowledge base. The semantic similarity of knowledge base constructed by this method can reach more than 84%, and the effectiveness of the proposed method is verified.展开更多
A knowledge graph is a structured graph in which data obtained from multiple sources are standardized to acquire and integrate human knowledge.Research is being actively conducted to cover a wide variety of knowledge,...A knowledge graph is a structured graph in which data obtained from multiple sources are standardized to acquire and integrate human knowledge.Research is being actively conducted to cover a wide variety of knowledge,as it can be applied to applications that help humans.However,existing researches are constructing knowledge graphs without the time information that knowledge implies.Knowledge stored without time information becomes outdated over time,and in the future,the possibility of knowledge being false or meaningful changes is excluded.As a result,they can’t reect information that changes dynamically,and they can’t accept information that has newly emerged.To solve this problem,this paper proposes Time-Aware PolarisX,an automatically extended knowledge graph including time information.TimeAware PolarisX constructed a BERT model with a relation extractor and an ensemble NER model including a time tag with an entity extractor to extract knowledge consisting of subject,relation,and object from unstructured text.Through two application experiments,it shows that the proposed system overcomes the limitations of existing systems that do not consider time information when applied to an application such as a chatbot.Also,we verify that the accuracy of the extraction model is improved through a comparative experiment with the existing model.展开更多
This paper summarizes the research results dealing with washer and nut taxonomy and knowledge base design, making the use of fuzzy methodology. In particular, the theory of fuzzy membership functions, similarity matri...This paper summarizes the research results dealing with washer and nut taxonomy and knowledge base design, making the use of fuzzy methodology. In particular, the theory of fuzzy membership functions, similarity matrices, and the operation of fuzzy inference play important roles.A realistic set of 25 washers and nuts are employed to conduct extensive experiments and simulations.The investigation includes a complete demonstration of engineering design. The results obtained from this feasibility study are very encouraging indeed because they represent the lower bound with respect to performance, namely correctrecognition rate, of what fuzzy methodology can do. This lower bound shows high recognition rate even with noisy input patterns, robustness in terms of noise tolerance, and simplicity in hardware implementation. Possible future works are suggested in the conclusion.展开更多
Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtaine...Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtained by processing data gathered from several heterogeneous sources. While some extracted facts can be correct at the origin, it is not possible to verify that correlations among the mare always true (e.g., they can relate to different points of time). We need systems smart enough to separate signal from noise and hence extract real value from this abundance of content accessible on the Web. In order to extract information from heterogeneous sources, we are involved into the entire process of identifying specific facts/events of interest. We propose a gluing architecture, driving the whole knowledge acquisition process, from data acquisition from external heterogeneous resources to their exploitation for RDF trip lification to support reasoning tasks. Once the extraction process is completed, a dedicated reasoner can infer new knowledge as a result of the reasoning process defined by the end user by means of specific inference rules over both extracted information and the background knowledge. The end user is supported in this context with an intelligent interface allowing to visualize either specific data/concepts, or all information inferred by applying deductive reasoning over a collection of data.展开更多
Knowledge graph technology play a more and more important role in various fields of industry and academia.This paper firstly introduces the general framework of the knowledge graph construction,which includes three st...Knowledge graph technology play a more and more important role in various fields of industry and academia.This paper firstly introduces the general framework of the knowledge graph construction,which includes three stages:information extraction,knowledge fusion and knowledge processing.In order to improve the efficiency of quality and safety supervision of transportation engineering construction,this paper constructs a knowledge graph by acquiring multi-sources heterogeneous data from supervision of transportation engineering quality and safety.It employs a bottom-up construction strategy and some natural language processing methods to solve the problems of the knowledge extraction for transportation engineering construction.We use the entity relation extraction method to extract the entity triples from the multi-sources heterogeneous data,and then employ knowledge inference to complete the edges in the constructed knowledge graph,finally perform quality evaluation to add the valid triples to the knowledge graph for updating.Subgraph matching technology is also exploited to retrieve the constructed knowledge graph for efficiently acquiring the useful knowledge about the quality and safety of transportation engineering projects.The results show that the constructed knowledge graph provides a practical and valuable tool for the quality and safety supervision of transportation engineering construction.展开更多
Open Relation Extraction(ORE)is a task of extracting semantic relations from a text document.Current ORE systems have significantly improved their efficiency in obtaining Chinese relations,when compared with conventio...Open Relation Extraction(ORE)is a task of extracting semantic relations from a text document.Current ORE systems have significantly improved their efficiency in obtaining Chinese relations,when compared with conventional systems which heavily depend on feature engineering or syntactic parsing.However,the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively.In respons to this issue,a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement(CORE-KE)is presented in this paper.The CORE-KE system employs a pre-trained language model(with the support of a Bidirectional Long Short-Term Memory(BiLSTM)layer and a Masked Conditional Random Field(Masked CRF)layer)on unstructured data in order to improve Chinese open relation extraction.Entity descriptions in Wikidata and additional knowledge(in terms of triple facts)extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model.In addition,syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement.Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems.The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1%and 1.3%,when compared with benchmark ORE systems,respectively.The source code is available at https:/github.COm/cjwen15/CORE-KE.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.81973695)Discipline with Strong Characteristics of Liaocheng University-Intelligent Science and Technology(Grant No.319462208).
文摘Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data.Although researchers have proposed various methods to accomplish this task,most existing event extraction models cannot address these challenges because they are only applicable to text scenarios.To solve the above issues,this paper proposes a multi-modal event extraction method based on knowledge fusion.Specifically,for event-type recognition,we use a meticulous pipeline approach that integrates multiple pre-trained models.This approach enables a more comprehensive capture of the multidimensional event semantic features present in military texts,thereby enhancing the interconnectedness of information between trigger words and events.For event element extraction,we propose a method for constructing a priori templates that combine event types with corresponding trigger words.This approach facilitates the acquisition of fine-grained input samples containing event trigger words,thus enabling the model to understand the semantic relationships between elements in greater depth.Furthermore,a fusion method for spatial mapping of textual event elements and image elements is proposed to reduce the category number overload and effectively achieve multi-modal knowledge fusion.The experimental results based on the CCKS 2022 dataset show that our method has achieved competitive results,with a comprehensive evaluation value F1-score of 53.4%for the model.These results validate the effectiveness of our method in extracting event elements from multi-modal data.
基金National Natural Science Foundation of China,Grant/Award Numbers:61671064,61732005National Key Research&Development Program,Grant/Award Number:2018YFC0831700。
文摘Emotion cause extraction(ECE)task that aims at extracting potential trigger events of certain emotions has attracted extensive attention recently.However,current work neglects the implicit emotion expressed without any explicit emotional keywords,which appears more frequently in application scenarios.The lack of explicit emotion information makes it extremely hard to extract emotion causes only with the local context.Moreover,an entire event is usually across multiple clauses,while existing work merely extracts cause events at clause level and cannot effectively capture complete cause event information.To address these issues,the events are first redefined at the tuple level and a span-based tuple-level algorithm is proposed to extract events from different clauses.Based on it,a corpus for implicit emotion cause extraction that tries to extract causes of implicit emotions is constructed.The authors propose a knowledge-enriched jointlearning model of implicit emotion recognition and implicit emotion cause extraction tasks(KJ-IECE),which leverages commonsense knowledge from ConceptNet and NRC_VAD to better capture connections between emotion and corresponding cause events.Experiments on both implicit and explicit emotion cause extraction datasets demonstrate the effectiveness of the proposed model.
文摘To address the difficulty of training high-quality models in some specific domains due to the lack of fine-grained annotation resources, we propose in this paper a knowledge-integrated cross-domain data generation method for unsupervised domain adaptation tasks. Specifically, we extract domain features, lexical and syntactic knowledge from source-domain and target-domain data, and use a masking model with an extended masking strategy and a re-masking strategy to obtain domain-specific data that remove domain-specific features. Finally, we improve the sequence generation model BART and use it to generate high-quality target domain data for the task of aspect and opinion co-extraction from the target domain. Experiments were performed on three conventional English datasets from different domains, and our method generates more accurate and diverse target domain data with the best results compared to previous methods.
基金This research is supported by the Chinese Special Projects of the National Key Research and Development Plan(2019YFB1405702).
文摘The acquisition of valuable design knowledge from massive fragmentary data is challenging for designers in conceptual product design.This study proposes a novel method for acquiring design knowledge by combining deep learning with knowledge graph.Specifically,the design knowledge acquisition method utilises the knowledge extraction model to extract design-related entities and relations from fragmentary data,and further constructs the knowledge graph to support design knowledge acquisition for conceptual product design.Moreover,the knowledge extraction model introduces ALBERT to solve memory limitation and communication overhead in the entity extraction module,and uses multi-granularity information to overcome segmentation errors and polysemy ambiguity in the relation extraction module.Experimental comparison verified the effectiveness and accuracy of the proposed knowledge extraction model.The case study demonstrated the feasibility of the knowledge graph construction with real fragmentary porcelain data and showed the capability to provide designers with interconnected and visualised design knowledge.
基金supported in part by the Beijing Natural Science Foundation under Grants L211020 and M21032in part by the National Natural Science Foundation of China under Grants U1836106 and 62271045in part by the Scientific and Technological Innovation Foundation of Foshan under Grants BK21BF001 and BK20BF010。
文摘Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information retrieval,transitioning it from mere string matching to far more sophisticated entity matching.In this transformative process,the advancement of artificial intelligence and intelligent information services is invigorated.Meanwhile,the role ofmachine learningmethod in the construction of KG is important,and these techniques have already achieved initial success.This article embarks on a comprehensive journey through the last strides in the field of KG via machine learning.With a profound amalgamation of cutting-edge research in machine learning,this article undertakes a systematical exploration of KG construction methods in three distinct phases:entity learning,ontology learning,and knowledge reasoning.Especially,a meticulous dissection of machine learningdriven algorithms is conducted,spotlighting their contributions to critical facets such as entity extraction,relation extraction,entity linking,and link prediction.Moreover,this article also provides an analysis of the unresolved challenges and emerging trajectories that beckon within the expansive application of machine learning-fueled,large-scale KG construction.
基金supported by the National Key R&D Program of China(2019YFB2103202).
文摘Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.
基金Supported by the Zhejiang Provincial Natural Science Foundation(No.LQ16H180004)~~
文摘Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activities of patients.Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE.This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks.In the data preprocessing of both tasks,a GloVe word embedding model was used to vectorize words.In the NER task,a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer.In the MRE task,the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset,the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks,where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task.Moreover,the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction.It also verified the feasibility of the BiLSTM-CRF model in different application scenarios,laying the foundation for the subsequent work in the EMR field.
基金Zhifang Liao:Ministry of Science and Technology:Key Research and Development Project(2018YFB003800),Hunan Provincial Key Laboratory of Finance&Economics Big Data Scienceand Technology(Hunan University of Finance and Economics)2017TP1025,HNNSF 2018JJ2535Shengzong Liu:NSF61802120.
文摘Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This paper divides multi-source software knowledge entities into unstructured data,semi-structured data and code data.For these different types of data,Bi-directional Long Short-Term Memory(Bi-LSTM)with Conditional Random Field(CRF),template matching,and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model(MEIM)to extract software entities.The model can be updated continuously based on user’s feedbacks to improve the accuracy.To deal with the shortage of entity annotation datasets,keyword extraction methods based on Term Frequency–Inverse Document Frequency(TF-IDF),TextRank,and K-Means are applied to annotate tasks.The proposed MEIM model is applied to the Spring Boot framework,which demonstrates good adaptability.The extracted entities are used to construct a knowledge graph,which is applied to association retrieval and association visualization.
基金This project is supported by Major International Cooperation Program of NSFC Grant 60221120145 Chinese Folk Music Digital Library.
文摘Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.
基金supported by the National Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project(Grant No.2018AAA0101800)the National Natural Science Foundation of China(Grant No.72271188).
文摘With the escalating complexity in production scenarios, vast amounts of production information are retained within enterprises in the industrial domain. Probing questions of how to meticulously excavate value from complex document information and establish coherent information links arise. In this work, we present a framework for knowledge graph construction in the industrial domain, predicated on knowledge-enhanced document-level entity and relation extraction. This approach alleviates the shortage of annotated data in the industrial domain and models the interplay of industrial documents. To augment the accuracy of named entity recognition, domain-specific knowledge is incorporated into the initialization of the word embedding matrix within the bidirectional long short-term memory conditional random field (BiLSTM-CRF) framework. For relation extraction, this paper introduces the knowledge-enhanced graph inference (KEGI) network, a pioneering method designed for long paragraphs in the industrial domain. This method discerns intricate interactions among entities by constructing a document graph and innovatively integrates knowledge representation into both node construction and path inference through TransR. On the application stratum, BiLSTM-CRF and KEGI are utilized to craft a knowledge graph from a knowledge representation model and Chinese fault reports for a steel production line, specifically SPOnto and SPFRDoc. The F1 value for entity and relation extraction has been enhanced by 2% to 6%. The quality of the extracted knowledge graph complies with the requirements of real-world production environment applications. The results demonstrate that KEGI can profoundly delve into production reports, extracting a wealth of knowledge and patterns, thereby providing a comprehensive solution for production management.
文摘Psychological counseling Q&A system is enjoying a remarkable and increasing popularity in recent years. Knowledge base is the important component for such kind of systems, but it is difficult and time-consuming to construct the knowledge base manually. Fortunately, there emerges large number of Q&A pairs in many psychological counseling websites, which can provide good source enriching the knowledge base. This paper presents the method of knowledge extraction from psychological consulting Q&A pairs of on-line psychological counseling websites, which include keywords, semantic extension and word sequence. P-XML, which is the knowledge template based on XML, is also designed to store the knowledge. The extracted knowledge has been successfully used in our non-obstructive psychologycal counseling system, called P.A.L., and the experimental results also demonstrated the feasibility and effectiveness of our approach.
基金The Open Fund of Hunan University of Traditional Chinese Medicine for the First-Class Discipline of Traditional Chinese Medicine(2018ZYX66)the Science Research Project of Hunan Provincial Department of Education(20C1391)the Natural Science Foundation of Hunan Province(2020JJ4461)。
文摘Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foundation for later knowledge reasoning and its application.Methods Under the guidance of experts in the classical formula of traditional Chinese medicine(TCM),the method of“top-down as the main,bottom-up as the auxiliary”was adopted to carry out knowledge extraction,knowledge fusion,and knowledge storage from the five aspects of the disease,syndrome,symptom,method,and formula for the original text of Treatise on Febrile Diseases,and so the knowledge graph of Treatise on Febrile Diseases was constructed.On this basis,the knowledge structure query and the knowledge relevance query were realized in a visual manner.Results The knowledge graph of“disease-syndrome-symptom-method-formula”in the Treatise on Febrile Diseases was constructed,containing 6469 entities and 10911 relational triples,on which the query of entities and their relationships can be carried out and the query result can be visualized.Conclusion The knowledge graph of Treatise on Febrile Diseases systematically realizes its digitization of the knowledge system,and improves the completeness and accuracy of the knowledge representation,and the connection between“disease-syndrome-symptom-treatment-formula”,which is conducive to the sharing and reuse of knowledge can be obtained in a clear and efficient way.
文摘This paper proposes a method to construct conceptual semantic knowledge base of software engineering domain based on Wikipedia. First, it takes the concept of SWEBOK V3 as the standard to extract the interpretation of the concept from the Wikipedia, and extracts the keywords as the concept of semantic;Second, through the conceptual semantic knowledge base, it is formed by the relationship between the hierarchical relationship concept and the other text interpretation concept in the Wikipedia. Finally, the semantic similarity between concepts is calculated by the random walk algorithm for the construction of the conceptual semantic knowledge base. The semantic similarity of knowledge base constructed by this method can reach more than 84%, and the effectiveness of the proposed method is verified.
基金supported by Basic Science Research Program through the NRF(National Research Foundation of Korea)the MSIT(Ministry of Science and ICT),Korea,under the National Program for Excellence in SW supervised by the IITP(Institute for Information&communications Technology Promotion)the Gachon University research fund of 2019(Nos.NRF2019R1A2C1008412,2015-0-00932,GCU-2019-0773)。
文摘A knowledge graph is a structured graph in which data obtained from multiple sources are standardized to acquire and integrate human knowledge.Research is being actively conducted to cover a wide variety of knowledge,as it can be applied to applications that help humans.However,existing researches are constructing knowledge graphs without the time information that knowledge implies.Knowledge stored without time information becomes outdated over time,and in the future,the possibility of knowledge being false or meaningful changes is excluded.As a result,they can’t reect information that changes dynamically,and they can’t accept information that has newly emerged.To solve this problem,this paper proposes Time-Aware PolarisX,an automatically extended knowledge graph including time information.TimeAware PolarisX constructed a BERT model with a relation extractor and an ensemble NER model including a time tag with an entity extractor to extract knowledge consisting of subject,relation,and object from unstructured text.Through two application experiments,it shows that the proposed system overcomes the limitations of existing systems that do not consider time information when applied to an application such as a chatbot.Also,we verify that the accuracy of the extraction model is improved through a comparative experiment with the existing model.
文摘This paper summarizes the research results dealing with washer and nut taxonomy and knowledge base design, making the use of fuzzy methodology. In particular, the theory of fuzzy membership functions, similarity matrices, and the operation of fuzzy inference play important roles.A realistic set of 25 washers and nuts are employed to conduct extensive experiments and simulations.The investigation includes a complete demonstration of engineering design. The results obtained from this feasibility study are very encouraging indeed because they represent the lower bound with respect to performance, namely correctrecognition rate, of what fuzzy methodology can do. This lower bound shows high recognition rate even with noisy input patterns, robustness in terms of noise tolerance, and simplicity in hardware implementation. Possible future works are suggested in the conclusion.
文摘Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtained by processing data gathered from several heterogeneous sources. While some extracted facts can be correct at the origin, it is not possible to verify that correlations among the mare always true (e.g., they can relate to different points of time). We need systems smart enough to separate signal from noise and hence extract real value from this abundance of content accessible on the Web. In order to extract information from heterogeneous sources, we are involved into the entire process of identifying specific facts/events of interest. We propose a gluing architecture, driving the whole knowledge acquisition process, from data acquisition from external heterogeneous resources to their exploitation for RDF trip lification to support reasoning tasks. Once the extraction process is completed, a dedicated reasoner can infer new knowledge as a result of the reasoning process defined by the end user by means of specific inference rules over both extracted information and the background knowledge. The end user is supported in this context with an intelligent interface allowing to visualize either specific data/concepts, or all information inferred by applying deductive reasoning over a collection of data.
基金This work was supported by Scientific Research Project of Department of Transportation of Hunan Province under Grant No.201814.
文摘Knowledge graph technology play a more and more important role in various fields of industry and academia.This paper firstly introduces the general framework of the knowledge graph construction,which includes three stages:information extraction,knowledge fusion and knowledge processing.In order to improve the efficiency of quality and safety supervision of transportation engineering construction,this paper constructs a knowledge graph by acquiring multi-sources heterogeneous data from supervision of transportation engineering quality and safety.It employs a bottom-up construction strategy and some natural language processing methods to solve the problems of the knowledge extraction for transportation engineering construction.We use the entity relation extraction method to extract the entity triples from the multi-sources heterogeneous data,and then employ knowledge inference to complete the edges in the constructed knowledge graph,finally perform quality evaluation to add the valid triples to the knowledge graph for updating.Subgraph matching technology is also exploited to retrieve the constructed knowledge graph for efficiently acquiring the useful knowledge about the quality and safety of transportation engineering projects.The results show that the constructed knowledge graph provides a practical and valuable tool for the quality and safety supervision of transportation engineering construction.
基金the high-level university construction special project of Guangdong province,China 2019(No.5041700175)the new engineering research and practice project of the Ministry of Education,China(NO.E-RGZN20201036)。
文摘Open Relation Extraction(ORE)is a task of extracting semantic relations from a text document.Current ORE systems have significantly improved their efficiency in obtaining Chinese relations,when compared with conventional systems which heavily depend on feature engineering or syntactic parsing.However,the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively.In respons to this issue,a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement(CORE-KE)is presented in this paper.The CORE-KE system employs a pre-trained language model(with the support of a Bidirectional Long Short-Term Memory(BiLSTM)layer and a Masked Conditional Random Field(Masked CRF)layer)on unstructured data in order to improve Chinese open relation extraction.Entity descriptions in Wikidata and additional knowledge(in terms of triple facts)extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model.In addition,syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement.Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems.The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1%and 1.3%,when compared with benchmark ORE systems,respectively.The source code is available at https:/github.COm/cjwen15/CORE-KE.