The joint extraction of entities and their relations from certain texts plays a significant role in most natural language processes.For entity and relation extraction in a specific domain,we propose a hybrid neural fr...The joint extraction of entities and their relations from certain texts plays a significant role in most natural language processes.For entity and relation extraction in a specific domain,we propose a hybrid neural framework consisting of two parts:a span-based model and a graph-based model.The span-based model can tackle overlapping problems compared with BILOU methods,whereas the graph-based model treats relation prediction as graph classification.Our main contribution is to incorporate external lexical and syntactic knowledge of a specific domain,such as domain dictionaries and dependency structures from texts,into end-to-end neural models.We conducted extensive experiments on a Chinese military entity and relation extraction corpus.The results show that the proposed framework outperforms the baselines with better performance in terms of entity and relation prediction.The proposed method provides insight into problems with the joint extraction of entities and their relations.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and e...Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.展开更多
A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have...A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have used deep learning methods for entity matching,and these methods have achieved good results.However,the common limitation of these methods is that they assume that different attribute columns of the same entity are independent,and inputting the model in the form of paired entity records will cause repeated calculations.In fact,there are often potential relations between different attribute columns of different entities.These relations can help us improve the effect of entity matching,and can perform feature extraction on a single entity record to avoid repeated calculations.To use attribute relations to assist entity matching,this paper proposes the Relation-aware Entity Matching method,which embeds attribute relations into the original entity description to form sentences,so that entity matching is transformed into a sentence-level similarity determination task,based on Sentence-BERT completes sentence similarity calculation.We have conducted experiments on structured,dirty,and textual data,and compared them with baselines in recent years.Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data.Our method has good results on most data sets for entity matching and reduces repeated calculations.展开更多
Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural...Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.展开更多
A qualia role-based entity-dependency graph(EDG)is proposed to represent and extract quantity relations for solving algebra story problems stated in Chinese.Traditional neural solvers use end-to-end models to translat...A qualia role-based entity-dependency graph(EDG)is proposed to represent and extract quantity relations for solving algebra story problems stated in Chinese.Traditional neural solvers use end-to-end models to translate problem texts into math expressions,which lack quantity relation acquisition in sophisticated scenarios.To address the problem,the proposed method leverages EDG to represent quantity relations hidden in qualia roles of math objects.Algorithms were designed for EDG generation and quantity relation extraction for solving algebra story problems.Experimental result shows that the proposedmethod achieved an average accuracy of 82.2%on quantity relation extraction compared to 74.5%of baseline method.Another prompt learning result shows a 5%increase obtained in problem solving by injecting the extracted quantity relations into the baseline neural solvers.展开更多
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
为获得结构化的小麦品种表型和遗传描述,针对非结构化小麦种质数据中存在的实体边界模糊以及关系重叠问题,提出一种基于深度字词融合的小麦种质信息实体关系联合抽取模型WGIE-DCWF(wheat germplasm information extraction model based ...为获得结构化的小麦品种表型和遗传描述,针对非结构化小麦种质数据中存在的实体边界模糊以及关系重叠问题,提出一种基于深度字词融合的小麦种质信息实体关系联合抽取模型WGIE-DCWF(wheat germplasm information extraction model based on deep character and word fusion)。模型编码层通过深度字词融合和上下文语义特征融合,提高密集实体特征识别能力;模型三元组抽取层建立层叠指针网络,提高重叠关系的提取能力。在小麦种质数据集和公开数据集上的一系列对比实验结果表明,WGIE-DCWF模型能够有效提高小麦种质数据实体关系联合抽取效果,同时拥有较好的泛化性,可以为小麦种质信息知识库构建提供技术支撑。展开更多
为缓解旅游领域知识分散、信息碎片化的问题,提出一种基于ChatGLM(chat generative language model)和提示微调的实体关系抽取模型ChatGLM-ppt(ChatGLM with prompt and p-tuning)。该模型借助ChatGLM以对话形式完成实体关系抽取任务,...为缓解旅游领域知识分散、信息碎片化的问题,提出一种基于ChatGLM(chat generative language model)和提示微调的实体关系抽取模型ChatGLM-ppt(ChatGLM with prompt and p-tuning)。该模型借助ChatGLM以对话形式完成实体关系抽取任务,并通过P-Tuning v2微调和添加提示模板的方法应对实体关系抽取中错误传播、实体冗余和关系重叠等问题。实验建立在自建的旅游领域数据集上,结果表明:在旅游领域实体关系抽取问题上ChatGLM-ppt模型F 1为92.19%,在处理重叠关系问题中F 1均大于90%,优于目前主流的实体关系抽取模型,证明该模型可有效提高实体关系抽取的准确率。进一步运用Neo4j图数据库构建旅游知识图谱,整合分散的旅游信息资源,对促进旅游业的数字化转型和智能化发展具有一定的参考意义。展开更多
篇章关系抽取旨在识别篇章中实体对之间的关系.相较于传统的句子级别关系抽取,篇章级别关系抽取任务更加贴近实际应用,但是它对实体对的跨句子推理和上下文信息感知等问题提出了新的挑战.本文提出融合实体和上下文信息(Fuse entity and ...篇章关系抽取旨在识别篇章中实体对之间的关系.相较于传统的句子级别关系抽取,篇章级别关系抽取任务更加贴近实际应用,但是它对实体对的跨句子推理和上下文信息感知等问题提出了新的挑战.本文提出融合实体和上下文信息(Fuse entity and context information,FECI)的篇章关系抽取方法,它包含两个模块,分别是实体信息抽取模块和上下文信息抽取模块.实体信息抽取模块从两个实体中自动地抽取出能够表示实体对关系的特征.上下文信息抽取模块根据实体对的提及位置信息,从篇章中抽取不同的上下文关系特征.本文在三个篇章级别的关系抽取数据集上进行实验,效果得到显著提升.展开更多
古汉语文本承载着丰富的历史和文化信息,对这类文本进行实体关系抽取研究并构建相关知识图谱对于文化传承具有重要作用.针对古汉语文本中存在大量生僻汉字、语义模糊和复义等问题,提出了一种基于BERT古文预训练模型的实体关系联合抽取模...古汉语文本承载着丰富的历史和文化信息,对这类文本进行实体关系抽取研究并构建相关知识图谱对于文化传承具有重要作用.针对古汉语文本中存在大量生僻汉字、语义模糊和复义等问题,提出了一种基于BERT古文预训练模型的实体关系联合抽取模型(entity relation joint extraction model based on BERT-ancient-Chinese pretrained model,JEBAC).首先,通过融合BiLSTM神经网络和注意力机制的BERT古文预训练模型(BERT-ancientChinese pre-trained model integrated BiLSTM neural network and attention mechanism,BACBA),识别出句中所有的subject实体和object实体,为关系和object实体联合抽取提供依据.接下来,将subject实体的归一化编码向量与整个句子的嵌入向量相加,以更好地理解句中subject实体的语义特征;最后,结合带有subject实体特征的句子向量和object实体的提示信息,通过BACBA实现句中关系和object实体的联合抽取,从而得到句中所有的三元组信息(subject实体,关系,object实体).在中文实体关系抽取DuIE2.0数据集和CCKS 2021的文言文实体关系抽取CCLUE小样本数据集上,与现有的方法进行了性能比较.实验结果表明,该方法在抽取性能上更加有效,F1值分别可达79.2%和55.5%.展开更多
基金supported by the Jiangsu Province“333”project BRA2020418the NSFC under Grant Number 71901215+2 种基金the National University of Defense Technology Research Project ZK20-46the Outstanding Young Talents Program of National University of Defense Technologythe National University of Defense Technology Youth Innovation Project。
文摘The joint extraction of entities and their relations from certain texts plays a significant role in most natural language processes.For entity and relation extraction in a specific domain,we propose a hybrid neural framework consisting of two parts:a span-based model and a graph-based model.The span-based model can tackle overlapping problems compared with BILOU methods,whereas the graph-based model treats relation prediction as graph classification.Our main contribution is to incorporate external lexical and syntactic knowledge of a specific domain,such as domain dictionaries and dependency structures from texts,into end-to-end neural models.We conducted extensive experiments on a Chinese military entity and relation extraction corpus.The results show that the proposed framework outperforms the baselines with better performance in terms of entity and relation prediction.The proposed method provides insight into problems with the joint extraction of entities and their relations.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
文摘Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.
基金This work is funded by Guangdong Basic and Applied Basic Research Foundation(No.2021A1515012307,2020A1515010450)Guangzhou Basic and Applied Basic Research Foundation(No.202102021207,202102020867)+4 种基金the National Natural Science Foundation of China(No.62072130,61702220,61702223)Guangdong Province Key Area R&D Program of China(No.2019B010136003,2019B010137004)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2019)Guangdong Higher Education Innovation Group(No.2020KCXTD007)Guangzhou Higher Education Innovation Group(No.202032854)。
文摘A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have used deep learning methods for entity matching,and these methods have achieved good results.However,the common limitation of these methods is that they assume that different attribute columns of the same entity are independent,and inputting the model in the form of paired entity records will cause repeated calculations.In fact,there are often potential relations between different attribute columns of different entities.These relations can help us improve the effect of entity matching,and can perform feature extraction on a single entity record to avoid repeated calculations.To use attribute relations to assist entity matching,this paper proposes the Relation-aware Entity Matching method,which embeds attribute relations into the original entity description to form sentences,so that entity matching is transformed into a sentence-level similarity determination task,based on Sentence-BERT completes sentence similarity calculation.We have conducted experiments on structured,dirty,and textual data,and compared them with baselines in recent years.Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data.Our method has good results on most data sets for entity matching and reduces repeated calculations.
基金The National Natural Science Foundation of China(No.61502095).
文摘Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.
基金supported by the National Natural Science Foundation of China (Nos.62177024,62007014)the Humanities and Social Sciences Youth Fund of the Ministry of Education (No.20YJC880024)+1 种基金China Post Doctoral Science Foundation (No.2019M652678)the Fundamental Research Funds for the Central Universities (No.CCNU20ZT019).
文摘A qualia role-based entity-dependency graph(EDG)is proposed to represent and extract quantity relations for solving algebra story problems stated in Chinese.Traditional neural solvers use end-to-end models to translate problem texts into math expressions,which lack quantity relation acquisition in sophisticated scenarios.To address the problem,the proposed method leverages EDG to represent quantity relations hidden in qualia roles of math objects.Algorithms were designed for EDG generation and quantity relation extraction for solving algebra story problems.Experimental result shows that the proposedmethod achieved an average accuracy of 82.2%on quantity relation extraction compared to 74.5%of baseline method.Another prompt learning result shows a 5%increase obtained in problem solving by injecting the extracted quantity relations into the baseline neural solvers.
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
文摘为获得结构化的小麦品种表型和遗传描述,针对非结构化小麦种质数据中存在的实体边界模糊以及关系重叠问题,提出一种基于深度字词融合的小麦种质信息实体关系联合抽取模型WGIE-DCWF(wheat germplasm information extraction model based on deep character and word fusion)。模型编码层通过深度字词融合和上下文语义特征融合,提高密集实体特征识别能力;模型三元组抽取层建立层叠指针网络,提高重叠关系的提取能力。在小麦种质数据集和公开数据集上的一系列对比实验结果表明,WGIE-DCWF模型能够有效提高小麦种质数据实体关系联合抽取效果,同时拥有较好的泛化性,可以为小麦种质信息知识库构建提供技术支撑。
文摘为缓解旅游领域知识分散、信息碎片化的问题,提出一种基于ChatGLM(chat generative language model)和提示微调的实体关系抽取模型ChatGLM-ppt(ChatGLM with prompt and p-tuning)。该模型借助ChatGLM以对话形式完成实体关系抽取任务,并通过P-Tuning v2微调和添加提示模板的方法应对实体关系抽取中错误传播、实体冗余和关系重叠等问题。实验建立在自建的旅游领域数据集上,结果表明:在旅游领域实体关系抽取问题上ChatGLM-ppt模型F 1为92.19%,在处理重叠关系问题中F 1均大于90%,优于目前主流的实体关系抽取模型,证明该模型可有效提高实体关系抽取的准确率。进一步运用Neo4j图数据库构建旅游知识图谱,整合分散的旅游信息资源,对促进旅游业的数字化转型和智能化发展具有一定的参考意义。
文摘篇章关系抽取旨在识别篇章中实体对之间的关系.相较于传统的句子级别关系抽取,篇章级别关系抽取任务更加贴近实际应用,但是它对实体对的跨句子推理和上下文信息感知等问题提出了新的挑战.本文提出融合实体和上下文信息(Fuse entity and context information,FECI)的篇章关系抽取方法,它包含两个模块,分别是实体信息抽取模块和上下文信息抽取模块.实体信息抽取模块从两个实体中自动地抽取出能够表示实体对关系的特征.上下文信息抽取模块根据实体对的提及位置信息,从篇章中抽取不同的上下文关系特征.本文在三个篇章级别的关系抽取数据集上进行实验,效果得到显著提升.
文摘古汉语文本承载着丰富的历史和文化信息,对这类文本进行实体关系抽取研究并构建相关知识图谱对于文化传承具有重要作用.针对古汉语文本中存在大量生僻汉字、语义模糊和复义等问题,提出了一种基于BERT古文预训练模型的实体关系联合抽取模型(entity relation joint extraction model based on BERT-ancient-Chinese pretrained model,JEBAC).首先,通过融合BiLSTM神经网络和注意力机制的BERT古文预训练模型(BERT-ancientChinese pre-trained model integrated BiLSTM neural network and attention mechanism,BACBA),识别出句中所有的subject实体和object实体,为关系和object实体联合抽取提供依据.接下来,将subject实体的归一化编码向量与整个句子的嵌入向量相加,以更好地理解句中subject实体的语义特征;最后,结合带有subject实体特征的句子向量和object实体的提示信息,通过BACBA实现句中关系和object实体的联合抽取,从而得到句中所有的三元组信息(subject实体,关系,object实体).在中文实体关系抽取DuIE2.0数据集和CCKS 2021的文言文实体关系抽取CCLUE小样本数据集上,与现有的方法进行了性能比较.实验结果表明,该方法在抽取性能上更加有效,F1值分别可达79.2%和55.5%.