The assortedness of Chinese food,together with the complexity of their naming elements,has ignited numerous scholars' interests in this field and prompted them to make abundant analyses of Chinese dish names.Most ...The assortedness of Chinese food,together with the complexity of their naming elements,has ignited numerous scholars' interests in this field and prompted them to make abundant analyses of Chinese dish names.Most of them,however,were done in studies of traditional linguistics,rhetoric,translatology and cross-cultural communication.And studies,based on corpus,on the naming elements of Chinese dishes under cognitive linguistic theories almost remain a blank.This paper aims to conduct a quantitative analysis of 4,000 Chinese dish names(500 ones selected freely from each of the eight cuisines),based on the Prominence Principle,in order to identify the specific naming elements of Chinese dishes and forward related statistics and ratios.展开更多
In this paper,the geographic name in Southwest China is regarded as a symbolic representation of human beings,and the dynamic social and historical process behind the place names is restored from the perspective of th...In this paper,the geographic name in Southwest China is regarded as a symbolic representation of human beings,and the dynamic social and historical process behind the place names is restored from the perspective of the symbolic anthropology.There are three paths in the construction and evolution of geographic names in Southwest China—Ethnic information,sacred systems,and local representation,which have been rewritten,masked,and reconstructed over the years.As a result,the system of geographical names is gradually formed and integrated into local memory through space building,culture filling,and so on,affecting and influencing local group identity and cognitive concept.展开更多
The naming convention in English-speaking countries(e.g.,USA and UK),and several others in the Western culture,where women traditionally have adopted their husbands’surnames,is compared with the naming convention in ...The naming convention in English-speaking countries(e.g.,USA and UK),and several others in the Western culture,where women traditionally have adopted their husbands’surnames,is compared with the naming convention in Spain and Latin America,where women do not relinquish their maiden surnames.From a cross-cultural perspective spanning over three centuries,from Madame de Staël and Virginia Woolf to Hillary Clinton,this essay renders instances of women who took on the surname of their spouse upon marriage.It appears that even nowadays many women,including feminists,choose to comply with this patriarchal habit.Entanglements arising upon divorce or remarriage,such as traceability and perception of selfhood,especially for women with academic and professional profiles,are discussed here.Samples collected from life and literature across a fairly representative cultural range and diverse moments in history help to reach conclusions and come up with a consistent argument.Winds of change seem to be blowing with Vice President Kamala Harris,whose case is mentioned at the end of this essay.To circumvent the confusion for individuals and families(especially“blended”ones)that could result in the discrimination between males and females,on the one hand,and on the other hand,between married and unmarried women,the Spanish naming convention is proposed as a perfect compromise.This consists in every person bearing two surnames from birth and for good:one of each parent.Thus,women would keep their name(s),and along with them their perception of their self and their social and professional identity.展开更多
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom...Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.展开更多
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c...Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.展开更多
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
The context of recognizing handwritten city names,this research addresses the challenges posed by the manual inscription of Bangladeshi city names in the Bangla script.In today’s technology-driven era,where precise t...The context of recognizing handwritten city names,this research addresses the challenges posed by the manual inscription of Bangladeshi city names in the Bangla script.In today’s technology-driven era,where precise tools for reading handwritten text are essential,this study focuses on leveraging deep learning to understand the intricacies of Bangla handwriting.The existing dearth of dedicated datasets has impeded the progress of Bangla handwritten city name recognition systems,particularly in critical areas such as postal automation and document processing.Notably,no prior research has specifically targeted the unique needs of Bangla handwritten city name recognition.To bridge this gap,the study collects real-world images from diverse sources to construct a comprehensive dataset for Bangla Hand Written City name recognition.The emphasis on practical data for system training enhances accuracy.The research further conducts a comparative analysis,pitting state-of-the-art(SOTA)deep learning models,including EfficientNetB0,VGG16,ResNet50,DenseNet201,InceptionV3,and Xception,against a custom Convolutional Neural Networks(CNN)model named“Our CNN.”The results showcase the superior performance of“Our CNN,”with a test accuracy of 99.97% and an outstanding F1 score of 99.95%.These metrics underscore its potential for automating city name recognition,particularly in postal services.The study concludes by highlighting the significance of meticulous dataset curation and the promising outlook for custom CNN architectures.It encourages future research avenues,including dataset expansion,algorithm refinement,exploration of recurrent neural networks and attention mechanisms,real-world deployment of models,and extension to other regional languages and scripts.These recommendations offer exciting possibilities for advancing the field of handwritten recognition technology and hold practical implications for enhancing global postal services.展开更多
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity bou...Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition.展开更多
Dear Jack,I'm very glad to know that you'll come to China to learn Chinese.And you want to know about Chinese names.Now,I'd like to tell you something about them.Chinese names are different from English na...Dear Jack,I'm very glad to know that you'll come to China to learn Chinese.And you want to know about Chinese names.Now,I'd like to tell you something about them.Chinese names are different from English names.In Chinese,family names always come first and given names come last Given names usually have some special meanings.We also had informal names when we were little kids,such as Congcong,Nana and so on.展开更多
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La...Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.展开更多
文摘The assortedness of Chinese food,together with the complexity of their naming elements,has ignited numerous scholars' interests in this field and prompted them to make abundant analyses of Chinese dish names.Most of them,however,were done in studies of traditional linguistics,rhetoric,translatology and cross-cultural communication.And studies,based on corpus,on the naming elements of Chinese dishes under cognitive linguistic theories almost remain a blank.This paper aims to conduct a quantitative analysis of 4,000 Chinese dish names(500 ones selected freely from each of the eight cuisines),based on the Prominence Principle,in order to identify the specific naming elements of Chinese dishes and forward related statistics and ratios.
文摘In this paper,the geographic name in Southwest China is regarded as a symbolic representation of human beings,and the dynamic social and historical process behind the place names is restored from the perspective of the symbolic anthropology.There are three paths in the construction and evolution of geographic names in Southwest China—Ethnic information,sacred systems,and local representation,which have been rewritten,masked,and reconstructed over the years.As a result,the system of geographical names is gradually formed and integrated into local memory through space building,culture filling,and so on,affecting and influencing local group identity and cognitive concept.
文摘The naming convention in English-speaking countries(e.g.,USA and UK),and several others in the Western culture,where women traditionally have adopted their husbands’surnames,is compared with the naming convention in Spain and Latin America,where women do not relinquish their maiden surnames.From a cross-cultural perspective spanning over three centuries,from Madame de Staël and Virginia Woolf to Hillary Clinton,this essay renders instances of women who took on the surname of their spouse upon marriage.It appears that even nowadays many women,including feminists,choose to comply with this patriarchal habit.Entanglements arising upon divorce or remarriage,such as traceability and perception of selfhood,especially for women with academic and professional profiles,are discussed here.Samples collected from life and literature across a fairly representative cultural range and diverse moments in history help to reach conclusions and come up with a consistent argument.Winds of change seem to be blowing with Vice President Kamala Harris,whose case is mentioned at the end of this essay.To circumvent the confusion for individuals and families(especially“blended”ones)that could result in the discrimination between males and females,on the one hand,and on the other hand,between married and unmarried women,the Spanish naming convention is proposed as a perfect compromise.This consists in every person bearing two surnames from birth and for good:one of each parent.Thus,women would keep their name(s),and along with them their perception of their self and their social and professional identity.
基金This research was supported by the National Key Research and Development Program[2020YFB1006302].
文摘Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.
基金supported by the Outstanding Youth Team Project of Central Universities(QNTD202308)the Ant Group through CCF-Ant Research Fund(CCF-AFSG 769498 RF20220214).
文摘Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
基金MMU Postdoctoral and Research Fellow(Account:MMUI/230023.02).
文摘The context of recognizing handwritten city names,this research addresses the challenges posed by the manual inscription of Bangladeshi city names in the Bangla script.In today’s technology-driven era,where precise tools for reading handwritten text are essential,this study focuses on leveraging deep learning to understand the intricacies of Bangla handwriting.The existing dearth of dedicated datasets has impeded the progress of Bangla handwritten city name recognition systems,particularly in critical areas such as postal automation and document processing.Notably,no prior research has specifically targeted the unique needs of Bangla handwritten city name recognition.To bridge this gap,the study collects real-world images from diverse sources to construct a comprehensive dataset for Bangla Hand Written City name recognition.The emphasis on practical data for system training enhances accuracy.The research further conducts a comparative analysis,pitting state-of-the-art(SOTA)deep learning models,including EfficientNetB0,VGG16,ResNet50,DenseNet201,InceptionV3,and Xception,against a custom Convolutional Neural Networks(CNN)model named“Our CNN.”The results showcase the superior performance of“Our CNN,”with a test accuracy of 99.97% and an outstanding F1 score of 99.95%.These metrics underscore its potential for automating city name recognition,particularly in postal services.The study concludes by highlighting the significance of meticulous dataset curation and the promising outlook for custom CNN architectures.It encourages future research avenues,including dataset expansion,algorithm refinement,exploration of recurrent neural networks and attention mechanisms,real-world deployment of models,and extension to other regional languages and scripts.These recommendations offer exciting possibilities for advancing the field of handwritten recognition technology and hold practical implications for enhancing global postal services.
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.
文摘Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition.
文摘Dear Jack,I'm very glad to know that you'll come to China to learn Chinese.And you want to know about Chinese names.Now,I'd like to tell you something about them.Chinese names are different from English names.In Chinese,family names always come first and given names come last Given names usually have some special meanings.We also had informal names when we were little kids,such as Congcong,Nana and so on.
文摘Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.