Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
针对大多数跨度模型将文本分割成跨度序列时,产生大量非实体跨度,导致了数据不平衡和计算复杂度高等问题,提出了基于跨度和边界探测的实体关系联合抽取模型(joint extraction model for entity relationships based on span and boundar...针对大多数跨度模型将文本分割成跨度序列时,产生大量非实体跨度,导致了数据不平衡和计算复杂度高等问题,提出了基于跨度和边界探测的实体关系联合抽取模型(joint extraction model for entity relationships based on span and boundary detection,SBDM)。SBDM首先使用训练Transformer的双向编码器表征量(bidirectional encoder representations from Transformer,BERT)模型将文本转化为词向量,并融合了通过图卷积获取的句法依赖信息以形成文本的特征表示;接着通过局部信息和句子上下文信息去探测实体边界并进行标记,以减少非实体跨度;然后将实体边界标记形成的跨度序列进行实体识别;最后将局部上下文信息融合到1个跨度实体对中并使用sigmoid函数进行关系分类。实验表明,SBDM在SciERC(multi-task identification of entities,relations,and coreference for scientific knowledge graph construction)数据集、CoNLL04(the 2004 conference on natural language learning)数据集上的关系分类指标S F1分别达到52.86%、74.47%,取得了较好效果。SBDM用于关系分类任务中,能促进跨度分类方法在关系抽取上的研究。展开更多
Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest i...Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size.展开更多
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
文摘针对大多数跨度模型将文本分割成跨度序列时,产生大量非实体跨度,导致了数据不平衡和计算复杂度高等问题,提出了基于跨度和边界探测的实体关系联合抽取模型(joint extraction model for entity relationships based on span and boundary detection,SBDM)。SBDM首先使用训练Transformer的双向编码器表征量(bidirectional encoder representations from Transformer,BERT)模型将文本转化为词向量,并融合了通过图卷积获取的句法依赖信息以形成文本的特征表示;接着通过局部信息和句子上下文信息去探测实体边界并进行标记,以减少非实体跨度;然后将实体边界标记形成的跨度序列进行实体识别;最后将局部上下文信息融合到1个跨度实体对中并使用sigmoid函数进行关系分类。实验表明,SBDM在SciERC(multi-task identification of entities,relations,and coreference for scientific knowledge graph construction)数据集、CoNLL04(the 2004 conference on natural language learning)数据集上的关系分类指标S F1分别达到52.86%、74.47%,取得了较好效果。SBDM用于关系分类任务中,能促进跨度分类方法在关系抽取上的研究。
文摘Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size.