Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
The Cheng index distinguishes indica andjaponica rice based on six taxonomic traits.This index has been widely used for classifi- cation of indica and japonica varieties in China.In this study,a double haploid(DH)popu...The Cheng index distinguishes indica andjaponica rice based on six taxonomic traits.This index has been widely used for classifi- cation of indica and japonica varieties in China.In this study,a double haploid(DH)popula-tion derived from anther culture of ZYQ8/JX17 F,a typical inter-subspecies hybrid,was used to investigate the six taxonomictraits,i.e.leaf hairiness(LH),color of hullwhen heading(CHH),hairiness of hull(HH),length of the first and second panicle internode(LPI),length/width of grain(L/W),andphenol reaction(PH).The morphological in- dex(MI)was also calculated.Based on themolecular linkage map constructed from this展开更多
Discourse relation classification is a fundamental task for discourse analysis,which is essential for understanding the structure and connection of texts.Implicit discourse relation classification aims to determine th...Discourse relation classification is a fundamental task for discourse analysis,which is essential for understanding the structure and connection of texts.Implicit discourse relation classification aims to determine the relationship between adjacent sentences and is very challenging because it lacks explicit discourse connectives as linguistic cues and sufficient annotated training data.In this paper,we propose a discriminative instance selection method to construct synthetic implicit discourse relation data from easy-to-collect explicit discourse relations.An expanded instance consists of an argument pair and its sense label.We introduce the argument pair type classification task,which aims to distinguish between implicit and explicit argument pairs and select the explicit argument pairs that are most similar to natural implicit argument pairs for data expansion.We also propose a simple label-smoothing technique to assign robust sense labels for the selected argument pairs.We evaluate our method on PDTB 2.0 and PDTB 3.0.The results show that our method can consistently improve the performance of the baseline model,and achieve competitive results with the state-of-the-art models.展开更多
In this paper, the minimal residual (MRES) method for solving nonsymmetric equation systems was improved, the recurrence relation was deduced between the approximate solutions of the linear equation system Ax = b, a...In this paper, the minimal residual (MRES) method for solving nonsymmetric equation systems was improved, the recurrence relation was deduced between the approximate solutions of the linear equation system Ax = b, and a more effective method was presented, which can reduce the operational count and the storage.展开更多
Structure features need complicated pre-processing, and are probably domain-dependent. To reduce time cost of pre-processing, we propose a novel neural network architecture which is a bi-directional long-short-term-me...Structure features need complicated pre-processing, and are probably domain-dependent. To reduce time cost of pre-processing, we propose a novel neural network architecture which is a bi-directional long-short-term-memory recurrent-neural-network(Bi-LSTM-RNN) model based on low-cost sequence features such as words and part-of-speech(POS) tags, to classify the relation of two entities. First, this model performs bi-directional recurrent computation along the tokens of sentences. Then, the sequence is divided into five parts and standard pooling functions are applied over the token representations of each part. Finally, the token representations are concatenated and fed into a softmax layer for relation classification. We evaluate our model on two standard benchmark datasets in different domains, namely Sem Eval-2010 Task 8 and Bio NLP-ST 2016 Task BB3. In Sem Eval-2010 Task 8, the performance of our model matches those of the state-of-the-art models, achieving 83.0% in F1. In Bio NLP-ST 2016 Task BB3, our model obtains F1 51.3% which is comparable with that of the best system. Moreover, we find that the context between two target entities plays an important role in relation classification and it can be a replacement of the shortest dependency path.展开更多
This paper proposes a tree kernel method of semantic relation detection and classification (RDC) between named entities. It resolves two critical problems in previous tree kernel methods of RDC. First, a new tree ke...This paper proposes a tree kernel method of semantic relation detection and classification (RDC) between named entities. It resolves two critical problems in previous tree kernel methods of RDC. First, a new tree kernel is presented to better capture the inherent structural information in a parse tree by enabling the standard convolution tree kernel with context-sensitiveness and approximate matching of sub-trees. Second, an enriched parse tree structure is proposed to well derive necessary structural information, e.g., proper latent annotations, from a parse tree. Evaluation on the ACE RDC corpora shows that both the new tree kernel and the enriched parse tree structure contribute significantly to RDC and our tree kernel method much outperforms the state-of-the-art ones.展开更多
Background:The 11th revision of the International Classification of Diseases and Related Health Problems(ICD-11)was released on June 18,2018,by the World Health Organization and will come into effect on January 1,2022...Background:The 11th revision of the International Classification of Diseases and Related Health Problems(ICD-11)was released on June 18,2018,by the World Health Organization and will come into effect on January 1,2022.Apart from the chapters on the classification of diseases in the conventional medicine(CM),a new chapter,traditional medicine(TM)conditions–Module 1,was added.Low back pain(LBP)is one of the common reasons for the physician visits.The classification codes for LBP in the ICD-11 are vital to documenting accurate clinical diagnoses.Methods:The qualitative case study method was adopted.The secondary use data for 100 patients were randomly selected using the ICD-11 online interface to find the classification codes for both the CM section and the TM Conditions–Module 1(TM1)section for LBP diagnosis.Results:Of the 27 codes obtained from the CM section,six codes were not relevant to LBP,whereas the other 21 codes represented diagnoses of LBP and its related diseases or syndromes.In the TM1 section,six codes for different patterns and disorders represented the diagnoses for LBP from the TM perspective.Conclusion:This study indicates that specific diagnoses of LBP can be represented by the combination of CM classification codes and TM1 classification codes in the ICD-11;the CM codes represent specific and accurate clinical diagnoses for LBP,whereas the TM1 codes add more accuracy to the diagnoses of different patterns from the TM perspective.展开更多
Entity relation classification aims to classify the semantic relationship between two marked entities in a given sentence,and plays a vital role in various natural language processing applications.However,existing stu...Entity relation classification aims to classify the semantic relationship between two marked entities in a given sentence,and plays a vital role in various natural language processing applications.However,existing studies focus on exploiting mono-lingual data in English,due to the lack of labeled data in other languages.How to effectively benefit from a richly-labeled language to help a poorly-labeled language is still an open problem.In this paper,we come up with a language adaptation framework for cross-lingual entity relation classification.The basic idea is to employ adversarial neural networks(AdvNN)to transfer feature representations from one language to another.Especially,such a language adaptation framework enables feature imitation via the competition between a sentence encoder and a rival language discriminator to generate effective representations.To verify the effectiveness of AdvNN,we introduce two kinds of adversarial structures,dual-channel AdvNN and single-channel AdvNN.Experimental results on the ACE 2005 multilingual training corpus show that our single-channel AdvNN achieves the best performance on both unsupervised and semi-supervised scenarios,yielding an improvement of 6.61%and 2.98%over the state-of-the-art,respectively.Compared with baselines which directly adopt a machine translation module,we find that both dual-channel and single-channel AdvNN significantly improve the performances(F1)of cross-lingual entity relation classification.Moreover,extensive analysis and discussion demonstrate the appropriateness and effectiveness of different parameter settings in our language adaptation framework.展开更多
Temporal relation classification is one of contemporary demanding tasks of natural language processing. This task can be used in various applications such as question answering, summarization, and language specific in...Temporal relation classification is one of contemporary demanding tasks of natural language processing. This task can be used in various applications such as question answering, summarization, and language specific information retrieval. In this paper, we propose an improved algorithm for classifying temporal relations, between events or between events and time, using support vector machines (SVM). Along with gold-standard corpus features, the proposed method aims at exploiting some useful automatically generated syntactic features to improve the accuracy of classification. Accordingly, a number of novel kernel functions are introduced and evaluated. Our evaluations clearly demonstrate that adding syntactic features results in a considerable improvement over the state-of-the-art method of classifying temporal relations.展开更多
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
文摘The Cheng index distinguishes indica andjaponica rice based on six taxonomic traits.This index has been widely used for classifi- cation of indica and japonica varieties in China.In this study,a double haploid(DH)popula-tion derived from anther culture of ZYQ8/JX17 F,a typical inter-subspecies hybrid,was used to investigate the six taxonomictraits,i.e.leaf hairiness(LH),color of hullwhen heading(CHH),hairiness of hull(HH),length of the first and second panicle internode(LPI),length/width of grain(L/W),andphenol reaction(PH).The morphological in- dex(MI)was also calculated.Based on themolecular linkage map constructed from this
基金National Natural Science Foundation of China(Grant Nos.62376166,62306188,61876113)National Key R&D Program of China(No.2022YFC3303504).
文摘Discourse relation classification is a fundamental task for discourse analysis,which is essential for understanding the structure and connection of texts.Implicit discourse relation classification aims to determine the relationship between adjacent sentences and is very challenging because it lacks explicit discourse connectives as linguistic cues and sufficient annotated training data.In this paper,we propose a discriminative instance selection method to construct synthetic implicit discourse relation data from easy-to-collect explicit discourse relations.An expanded instance consists of an argument pair and its sense label.We introduce the argument pair type classification task,which aims to distinguish between implicit and explicit argument pairs and select the explicit argument pairs that are most similar to natural implicit argument pairs for data expansion.We also propose a simple label-smoothing technique to assign robust sense labels for the selected argument pairs.We evaluate our method on PDTB 2.0 and PDTB 3.0.The results show that our method can consistently improve the performance of the baseline model,and achieve competitive results with the state-of-the-art models.
文摘In this paper, the minimal residual (MRES) method for solving nonsymmetric equation systems was improved, the recurrence relation was deduced between the approximate solutions of the linear equation system Ax = b, and a more effective method was presented, which can reduce the operational count and the storage.
基金Supported by the China Postdoctoral Science Foundation(2014T70722)the Humanities and Social Science Foundation of Ministry of Education of China(16YJCZH004)
文摘Structure features need complicated pre-processing, and are probably domain-dependent. To reduce time cost of pre-processing, we propose a novel neural network architecture which is a bi-directional long-short-term-memory recurrent-neural-network(Bi-LSTM-RNN) model based on low-cost sequence features such as words and part-of-speech(POS) tags, to classify the relation of two entities. First, this model performs bi-directional recurrent computation along the tokens of sentences. Then, the sequence is divided into five parts and standard pooling functions are applied over the token representations of each part. Finally, the token representations are concatenated and fed into a softmax layer for relation classification. We evaluate our model on two standard benchmark datasets in different domains, namely Sem Eval-2010 Task 8 and Bio NLP-ST 2016 Task BB3. In Sem Eval-2010 Task 8, the performance of our model matches those of the state-of-the-art models, achieving 83.0% in F1. In Bio NLP-ST 2016 Task BB3, our model obtains F1 51.3% which is comparable with that of the best system. Moreover, we find that the context between two target entities plays an important role in relation classification and it can be a replacement of the shortest dependency path.
基金Supported by the National Natural Science Foundation of China under Grant Nos.60873150,60970056 and 90920004
文摘This paper proposes a tree kernel method of semantic relation detection and classification (RDC) between named entities. It resolves two critical problems in previous tree kernel methods of RDC. First, a new tree kernel is presented to better capture the inherent structural information in a parse tree by enabling the standard convolution tree kernel with context-sensitiveness and approximate matching of sub-trees. Second, an enriched parse tree structure is proposed to well derive necessary structural information, e.g., proper latent annotations, from a parse tree. Evaluation on the ACE RDC corpora shows that both the new tree kernel and the enriched parse tree structure contribute significantly to RDC and our tree kernel method much outperforms the state-of-the-art ones.
文摘Background:The 11th revision of the International Classification of Diseases and Related Health Problems(ICD-11)was released on June 18,2018,by the World Health Organization and will come into effect on January 1,2022.Apart from the chapters on the classification of diseases in the conventional medicine(CM),a new chapter,traditional medicine(TM)conditions–Module 1,was added.Low back pain(LBP)is one of the common reasons for the physician visits.The classification codes for LBP in the ICD-11 are vital to documenting accurate clinical diagnoses.Methods:The qualitative case study method was adopted.The secondary use data for 100 patients were randomly selected using the ICD-11 online interface to find the classification codes for both the CM section and the TM Conditions–Module 1(TM1)section for LBP diagnosis.Results:Of the 27 codes obtained from the CM section,six codes were not relevant to LBP,whereas the other 21 codes represented diagnoses of LBP and its related diseases or syndromes.In the TM1 section,six codes for different patterns and disorders represented the diagnoses for LBP from the TM perspective.Conclusion:This study indicates that specific diagnoses of LBP can be represented by the combination of CM classification codes and TM1 classification codes in the ICD-11;the CM codes represent specific and accurate clinical diagnoses for LBP,whereas the TM1 codes add more accuracy to the diagnoses of different patterns from the TM perspective.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos.61703293,61751206,and 61672368.
文摘Entity relation classification aims to classify the semantic relationship between two marked entities in a given sentence,and plays a vital role in various natural language processing applications.However,existing studies focus on exploiting mono-lingual data in English,due to the lack of labeled data in other languages.How to effectively benefit from a richly-labeled language to help a poorly-labeled language is still an open problem.In this paper,we come up with a language adaptation framework for cross-lingual entity relation classification.The basic idea is to employ adversarial neural networks(AdvNN)to transfer feature representations from one language to another.Especially,such a language adaptation framework enables feature imitation via the competition between a sentence encoder and a rival language discriminator to generate effective representations.To verify the effectiveness of AdvNN,we introduce two kinds of adversarial structures,dual-channel AdvNN and single-channel AdvNN.Experimental results on the ACE 2005 multilingual training corpus show that our single-channel AdvNN achieves the best performance on both unsupervised and semi-supervised scenarios,yielding an improvement of 6.61%and 2.98%over the state-of-the-art,respectively.Compared with baselines which directly adopt a machine translation module,we find that both dual-channel and single-channel AdvNN significantly improve the performances(F1)of cross-lingual entity relation classification.Moreover,extensive analysis and discussion demonstrate the appropriateness and effectiveness of different parameter settings in our language adaptation framework.
文摘Temporal relation classification is one of contemporary demanding tasks of natural language processing. This task can be used in various applications such as question answering, summarization, and language specific information retrieval. In this paper, we propose an improved algorithm for classifying temporal relations, between events or between events and time, using support vector machines (SVM). Along with gold-standard corpus features, the proposed method aims at exploiting some useful automatically generated syntactic features to improve the accuracy of classification. Accordingly, a number of novel kernel functions are introduced and evaluated. Our evaluations clearly demonstrate that adding syntactic features results in a considerable improvement over the state-of-the-art method of classifying temporal relations.