With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of train...With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.展开更多
Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking.It is of great significance to some NLP(natural langua...Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking.It is of great significance to some NLP(natural language processing)tasks,such as question answering.Unlike English entity linking,Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words,which is more evident in certain scenarios.In Chinese domains,such as industry,the generated candidate entities are usually composed of long strings and are heavily nested.In addition,the meanings of the words that make up industrial entities are sometimes ambiguous.Their semantic space is a subspace of the general word embedding space,and thus each entity word needs to get its exact meanings.Therefore,we propose two schemes to achieve better Chinese entity linking.First,we implement an ngram based candidate entity generation method to increase the recall rate and reduce the nesting noise.Then,we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding.Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain,we design a sense embedding model based on graph clustering,which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context.We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios.We confirm that our method can better learn candidate entities’fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.展开更多
Biography is a direct and extensive way to know the representation of well known peoples, however, for common people, there is poor knowledge for them to be recognized. In recent years, information extraction (IE) t...Biography is a direct and extensive way to know the representation of well known peoples, however, for common people, there is poor knowledge for them to be recognized. In recent years, information extraction (IE) technologies have been used to automatically generate biography for any people with online information. One of the key challenges is the entity linking (EL) which can link biography sentence to corresponding entities. Currently the used general EL systems usually generate errors originated from entity name variation and ambiguity. Compared with general text, biography sentences possess unique yet rarely studied relational knowledge (RK) and temporal knowledge (TK), which could sufficiently distinguish entities. This article proposed a new statistical framework called the knowledge enhanced EL (KeEL) system for automated biography construction. It utilizes commonsense knowledge like PK and TK to enhance Entity Linking. The performance of KeEL on Wikipedia data was evaluated. It is shown that, compared with state-of-the-art method, KeEL significantly improves the precision and recall of Entity Linking.展开更多
Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-dr...Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-driven applications on the Web(e.g.,news-reading and e-shopping)require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph(KG),which can take their performance to the next level.In light of this,in this paper,we identify a new research task:visual entity linking for fine-grained scene understanding.To accomplish the task,we first extract features of candidate entities from different modalities,i.e.,visual features,textual features,and KG features.Then,we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG.Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46%to 83.16%compared with baselines.展开更多
As one of the most important components in knowledge graph construction,entity linking has been drawing more and more attention in the last decade.In this paper,we propose two improvements towards better entity linkin...As one of the most important components in knowledge graph construction,entity linking has been drawing more and more attention in the last decade.In this paper,we propose two improvements towards better entity linking.On one hand,we propose a simple but effective coarse-to-fine unsupervised knowledge base(KB)extraction approach to improve the quality of KB,through which we can conduct entity linking more efficiently.On the other hand,we propose a highway network framework to bridge key words and sequential information captured with a self-attention mechanism to better represent both local and global information.Detailed experimentation on six public entity linking datasets verifies the great effectiveness of both our approaches.展开更多
Entity Linking(EL)aims to automatically link the mentions in unstructured documents to corresponding entities in a knowledge base(KB),which has recently been dominated by global models.Although many global EL methods ...Entity Linking(EL)aims to automatically link the mentions in unstructured documents to corresponding entities in a knowledge base(KB),which has recently been dominated by global models.Although many global EL methods attempt to model the topical coherence among all linked entities,most of them failed in exploiting the correlations among manifold knowledge helpful for linking,such as the semantics of mentions and their candidates,the neighborhood information of candidate entities in KB and the fine-grained type information of entities.As we will show in the paper,interactions among these types of information are very useful for better characterizing the topic features of entities and more accurately estimating the topical coherence among all the referred entities within the same document.In this paper,we present a novel HEterogeneous Graph-based Entity Linker(HEGEL)for global entity linking,which builds an informative heterogeneous graph for every document to collect various linking clues.Then HEGEL utilizes a novel heterogeneous graph neural network(HGNN)to integrate the different types of manifold information and model the interactions among them.Experiments on the standard benchmark datasets demonstrate that HEGEL can well capture the global coherence and outperforms the prior state-of-the-art EL methods.展开更多
Entity linking(EL)is a fundamental task in natural language processing.Based on neural networks,existing systems pay more attention to the construction of the global model,but ignore latent semantic information in the...Entity linking(EL)is a fundamental task in natural language processing.Based on neural networks,existing systems pay more attention to the construction of the global model,but ignore latent semantic information in the local model and the acquisition of effective entity type information.In this paper,we propose two adaptive features,in which the first adaptive feature enables the local and global models to capture latent information,and the second adaptive feature describes effective information for entity type embeddings.These adaptive features can work together naturally to handle some uncertain entity type information for EL.Experimental results demonstrate that our EL system achieves the best performance on the AIDA-B and MSNBC datasets,and the best average performance on out-domain datasets.These results indicate that the proposed adaptive features,which are based on their own diverse contexts,can capture information that is conducive for EL.展开更多
Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information ...Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information retrieval,transitioning it from mere string matching to far more sophisticated entity matching.In this transformative process,the advancement of artificial intelligence and intelligent information services is invigorated.Meanwhile,the role ofmachine learningmethod in the construction of KG is important,and these techniques have already achieved initial success.This article embarks on a comprehensive journey through the last strides in the field of KG via machine learning.With a profound amalgamation of cutting-edge research in machine learning,this article undertakes a systematical exploration of KG construction methods in three distinct phases:entity learning,ontology learning,and knowledge reasoning.Especially,a meticulous dissection of machine learningdriven algorithms is conducted,spotlighting their contributions to critical facets such as entity extraction,relation extraction,entity linking,and link prediction.Moreover,this article also provides an analysis of the unresolved challenges and emerging trajectories that beckon within the expansive application of machine learning-fueled,large-scale KG construction.展开更多
Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural...Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.展开更多
Knowledge bases(KBs)are often greatly incomplete,necessitating a demand for KB completion.Although XLORE is an English-Chinese bilingual knowledge graph,there are only 423,974 cross-lingual links between English insta...Knowledge bases(KBs)are often greatly incomplete,necessitating a demand for KB completion.Although XLORE is an English-Chinese bilingual knowledge graph,there are only 423,974 cross-lingual links between English instances and Chinese instances.We present XLORE2,an extension of the XLORE that is built automatically from Wikipedia,Baidu Baike and Hudong Baike.We add more facts by making cross-lingual knowledge linking,cross-lingual property matching and fine-grained type inference.We also design an entity linking system to demonstrate the effectiveness and broad coverage of XLORE2.展开更多
基金the Beijing Municipal Science and Technology Program(Z231100001323004)。
文摘With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.
基金supported by the National Natural Science Foundation of China under Grant Nos.61932004 and 62072205.
文摘Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking.It is of great significance to some NLP(natural language processing)tasks,such as question answering.Unlike English entity linking,Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words,which is more evident in certain scenarios.In Chinese domains,such as industry,the generated candidate entities are usually composed of long strings and are heavily nested.In addition,the meanings of the words that make up industrial entities are sometimes ambiguous.Their semantic space is a subspace of the general word embedding space,and thus each entity word needs to get its exact meanings.Therefore,we propose two schemes to achieve better Chinese entity linking.First,we implement an ngram based candidate entity generation method to increase the recall rate and reduce the nesting noise.Then,we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding.Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain,we design a sense embedding model based on graph clustering,which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context.We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios.We confirm that our method can better learn candidate entities’fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.
基金supported by the National Natural Science Foundation of China (61035004)
文摘Biography is a direct and extensive way to know the representation of well known peoples, however, for common people, there is poor knowledge for them to be recognized. In recent years, information extraction (IE) technologies have been used to automatically generate biography for any people with online information. One of the key challenges is the entity linking (EL) which can link biography sentence to corresponding entities. Currently the used general EL systems usually generate errors originated from entity name variation and ambiguity. Compared with general text, biography sentences possess unique yet rarely studied relational knowledge (RK) and temporal knowledge (TK), which could sufficiently distinguish entities. This article proposed a new statistical framework called the knowledge enhanced EL (KeEL) system for automated biography construction. It utilizes commonsense knowledge like PK and TK to enhance Entity Linking. The performance of KeEL on Wikipedia data was evaluated. It is shown that, compared with state-of-the-art method, KeEL significantly improves the precision and recall of Entity Linking.
文摘Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-driven applications on the Web(e.g.,news-reading and e-shopping)require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph(KG),which can take their performance to the next level.In light of this,in this paper,we identify a new research task:visual entity linking for fine-grained scene understanding.To accomplish the task,we first extract features of candidate entities from different modalities,i.e.,visual features,textual features,and KG features.Then,we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG.Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46%to 83.16%compared with baselines.
基金This work was supported by the key project of the National Natural Science Foundation of China(Grant No.61836007)the normal project of the National Natural Science Foundation of China(Grant No.61876118)the project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
文摘As one of the most important components in knowledge graph construction,entity linking has been drawing more and more attention in the last decade.In this paper,we propose two improvements towards better entity linking.On one hand,we propose a simple but effective coarse-to-fine unsupervised knowledge base(KB)extraction approach to improve the quality of KB,through which we can conduct entity linking more efficiently.On the other hand,we propose a highway network framework to bridge key words and sequential information captured with a self-attention mechanism to better represent both local and global information.Detailed experimentation on six public entity linking datasets verifies the great effectiveness of both our approaches.
基金supported in part by the National Key R&D Program of China(No.2020AAA0106600)the Key Laboratory of Science,Technology and Standard in Press Industry(Key Laboratory of Intelligent Press Media Technology)
文摘Entity Linking(EL)aims to automatically link the mentions in unstructured documents to corresponding entities in a knowledge base(KB),which has recently been dominated by global models.Although many global EL methods attempt to model the topical coherence among all linked entities,most of them failed in exploiting the correlations among manifold knowledge helpful for linking,such as the semantics of mentions and their candidates,the neighborhood information of candidate entities in KB and the fine-grained type information of entities.As we will show in the paper,interactions among these types of information are very useful for better characterizing the topic features of entities and more accurately estimating the topical coherence among all the referred entities within the same document.In this paper,we present a novel HEterogeneous Graph-based Entity Linker(HEGEL)for global entity linking,which builds an informative heterogeneous graph for every document to collect various linking clues.Then HEGEL utilizes a novel heterogeneous graph neural network(HGNN)to integrate the different types of manifold information and model the interactions among them.Experiments on the standard benchmark datasets demonstrate that HEGEL can well capture the global coherence and outperforms the prior state-of-the-art EL methods.
基金Project supported by the Key-Area Research and Development Program of Guangdong Province,China(No.2019B010153002)the Program of Marine Economy Development(Six Marine Industries)Special Foundation of Department of Natural Resources of Guangdong Province,China(No.GDNRC[2020]056)+2 种基金the National Natural Science Foundation of China(No.62002071)the Top Youth Talent Project of Zhujiang Talent Program,China(No.2019QN01X516)the Guangdong Provincial Key Laboratory of Cyber-Physical System,China(No.2020B1212060069)。
文摘Entity linking(EL)is a fundamental task in natural language processing.Based on neural networks,existing systems pay more attention to the construction of the global model,but ignore latent semantic information in the local model and the acquisition of effective entity type information.In this paper,we propose two adaptive features,in which the first adaptive feature enables the local and global models to capture latent information,and the second adaptive feature describes effective information for entity type embeddings.These adaptive features can work together naturally to handle some uncertain entity type information for EL.Experimental results demonstrate that our EL system achieves the best performance on the AIDA-B and MSNBC datasets,and the best average performance on out-domain datasets.These results indicate that the proposed adaptive features,which are based on their own diverse contexts,can capture information that is conducive for EL.
基金supported in part by the Beijing Natural Science Foundation under Grants L211020 and M21032in part by the National Natural Science Foundation of China under Grants U1836106 and 62271045in part by the Scientific and Technological Innovation Foundation of Foshan under Grants BK21BF001 and BK20BF010。
文摘Knowledge graph(KG)serves as a specialized semantic network that encapsulates intricate relationships among real-world entities within a structured framework.This framework facilitates a transformation in information retrieval,transitioning it from mere string matching to far more sophisticated entity matching.In this transformative process,the advancement of artificial intelligence and intelligent information services is invigorated.Meanwhile,the role ofmachine learningmethod in the construction of KG is important,and these techniques have already achieved initial success.This article embarks on a comprehensive journey through the last strides in the field of KG via machine learning.With a profound amalgamation of cutting-edge research in machine learning,this article undertakes a systematical exploration of KG construction methods in three distinct phases:entity learning,ontology learning,and knowledge reasoning.Especially,a meticulous dissection of machine learningdriven algorithms is conducted,spotlighting their contributions to critical facets such as entity extraction,relation extraction,entity linking,and link prediction.Moreover,this article also provides an analysis of the unresolved challenges and emerging trajectories that beckon within the expansive application of machine learning-fueled,large-scale KG construction.
基金The National Natural Science Foundation of China(No.61502095).
文摘Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.
基金National Natural Science Foundation of China(NSFC)key project(No.61533018,No.U1736204 and No.61661146007)Ministry of Education and China Mobile Research Fund(No.20181770250)and THUNUS NExT Co-Lab.
文摘Knowledge bases(KBs)are often greatly incomplete,necessitating a demand for KB completion.Although XLORE is an English-Chinese bilingual knowledge graph,there are only 423,974 cross-lingual links between English instances and Chinese instances.We present XLORE2,an extension of the XLORE that is built automatically from Wikipedia,Baidu Baike and Hudong Baike.We add more facts by making cross-lingual knowledge linking,cross-lingual property matching and fine-grained type inference.We also design an entity linking system to demonstrate the effectiveness and broad coverage of XLORE2.