The context of recognizing handwritten city names,this research addresses the challenges posed by the manual inscription of Bangladeshi city names in the Bangla script.In today’s technology-driven era,where precise t...The context of recognizing handwritten city names,this research addresses the challenges posed by the manual inscription of Bangladeshi city names in the Bangla script.In today’s technology-driven era,where precise tools for reading handwritten text are essential,this study focuses on leveraging deep learning to understand the intricacies of Bangla handwriting.The existing dearth of dedicated datasets has impeded the progress of Bangla handwritten city name recognition systems,particularly in critical areas such as postal automation and document processing.Notably,no prior research has specifically targeted the unique needs of Bangla handwritten city name recognition.To bridge this gap,the study collects real-world images from diverse sources to construct a comprehensive dataset for Bangla Hand Written City name recognition.The emphasis on practical data for system training enhances accuracy.The research further conducts a comparative analysis,pitting state-of-the-art(SOTA)deep learning models,including EfficientNetB0,VGG16,ResNet50,DenseNet201,InceptionV3,and Xception,against a custom Convolutional Neural Networks(CNN)model named“Our CNN.”The results showcase the superior performance of“Our CNN,”with a test accuracy of 99.97% and an outstanding F1 score of 99.95%.These metrics underscore its potential for automating city name recognition,particularly in postal services.The study concludes by highlighting the significance of meticulous dataset curation and the promising outlook for custom CNN architectures.It encourages future research avenues,including dataset expansion,algorithm refinement,exploration of recurrent neural networks and attention mechanisms,real-world deployment of models,and extension to other regional languages and scripts.These recommendations offer exciting possibilities for advancing the field of handwritten recognition technology and hold practical implications for enhancing global postal services.展开更多
Dear Jack,I'm very glad to know that you'll come to China to learn Chinese.And you want to know about Chinese names.Now,I'd like to tell you something about them.Chinese names are different from English na...Dear Jack,I'm very glad to know that you'll come to China to learn Chinese.And you want to know about Chinese names.Now,I'd like to tell you something about them.Chinese names are different from English names.In Chinese,family names always come first and given names come last Given names usually have some special meanings.We also had informal names when we were little kids,such as Congcong,Nana and so on.展开更多
The scientific names of organisms are key identifiers of plants and animals.Correctly treating scientific names is a prerequisite for biodiversity research and documentation.Here,we present an R package,’U.Taxonstand...The scientific names of organisms are key identifiers of plants and animals.Correctly treating scientific names is a prerequisite for biodiversity research and documentation.Here,we present an R package,’U.Taxonstand’,which can standardize and harmonize scientific names in plant and animal species lists at a fast speed and at a high rate of matching success.Unlike most of other similar R packages each of which works with only one taxonomic database,U.Taxonstand can work with all taxonomic databases,as long as they are properly formatted.Multiple databases for plants and animals that can be directly used by U.Taxonstand,which include bryophytes,vascular plants,amphibians,birds,fishes,mammals,and reptiles,are available online.U.Taxonstand can be a very useful tool for botanists,zoologists,ecologists and biogeographers to standardize and harmonize scientific names of organisms.展开更多
Guangzhou and Foshan enjoy relatively mature metro network.However,some names of metro stations are over-transliterated in Pinyin.Such a translation method is used in translating general names,nouns of locality and so...Guangzhou and Foshan enjoy relatively mature metro network.However,some names of metro stations are over-transliterated in Pinyin.Such a translation method is used in translating general names,nouns of locality and some names of tourist destinations.With translation landscape and linguistic landscape theories,the reasons and impacts of over-transliteration in Guangzhou and Foshan metro will be discussed from the perspective of symbolic function.English names of Metro stations in other cities serve as a reference so as to appropriate solutions.展开更多
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom...Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.展开更多
Streptococcus suis serotype 2(S.suis 2)is a zoonotic pathogen that clinically causes severe swine and human infections(such as meningitis,endocarditis,and septicemia).In order to cause widespread diseases in different...Streptococcus suis serotype 2(S.suis 2)is a zoonotic pathogen that clinically causes severe swine and human infections(such as meningitis,endocarditis,and septicemia).In order to cause widespread diseases in different organs,S.suis 2 must colonize the host,break the blood barrier,and cause exaggerated inflammation.In the last few years,most studies have focused on a single virulence factor and its influences on the host.Membrane vesicles(MVs)can be actively secreted into the extracellular environment contributing to bacteria-host interactions.Gram-negative bacteria-derived outer membrane vesicles(OMVs)were recently shown to activate host Caspase-11-mediated non-canonical inflammasome pathway via deliverance of OMV-bound lipopolysaccharide(LPS),causing host cell pyroptosis.However,little is known about the effect of the MVs from S.suis 2(Gram-positive bacteria without LPS)on cell pyroptosis.Thus,we investigated the molecular mechanism by which S.suis 2 MVs participate in endothelial cell pyroptosis.In this study,we used proteomics,electron scanning microscopy,fluorescence microscope,Western blotting,and bioassays,to investigate the MVs secreted by S.suis 2.First,we demonstrated that S.suis 2 secreted MVs with an average diameter of 72.04 nm,and 200 proteins in MVs were identified.Then,we showed that MVs were transported to cells via mainly dynamin-dependent endocytosis.The S.suis 2 MVs activated NLRP3/Caspase-1/GSDMD canonical inflammasome signaling pathway,resulting in cell pyroptosis,but it did not activate the Caspase-4/-5 pathway.More importantly,endothelial cells produce large amounts of reactive oxygen species(ROS)and lost their mitochondrial membrane potential under induction by S.suis 2 MVs.The results in this study suggest for the first time that MVs from S.suis 2 were internalized by endothelial cells via mainly dynamin-dependent endocytosis and might promote NLRP3/Caspase-1/GSDMD pathway by mitochondrial damage,which produced mtDNA and ROS under induction,leading to the pyroptosis of endothelial cells.展开更多
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c...Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.展开更多
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity bou...Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition.展开更多
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La...Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.展开更多
The assortedness of Chinese food,together with the complexity of their naming elements,has ignited numerous scholars' interests in this field and prompted them to make abundant analyses of Chinese dish names.Most ...The assortedness of Chinese food,together with the complexity of their naming elements,has ignited numerous scholars' interests in this field and prompted them to make abundant analyses of Chinese dish names.Most of them,however,were done in studies of traditional linguistics,rhetoric,translatology and cross-cultural communication.And studies,based on corpus,on the naming elements of Chinese dishes under cognitive linguistic theories almost remain a blank.This paper aims to conduct a quantitative analysis of 4,000 Chinese dish names(500 ones selected freely from each of the eight cuisines),based on the Prominence Principle,in order to identify the specific naming elements of Chinese dishes and forward related statistics and ratios.展开更多
Name system has developed simultaneously with the change of world language.Generalities of Chinese names and English names are discussed from 8 aspects.Meanwhile,the structural changes of names in Chinese and English ...Name system has developed simultaneously with the change of world language.Generalities of Chinese names and English names are discussed from 8 aspects.Meanwhile,the structural changes of names in Chinese and English are also contrasted in order to make some further exploration.Readers may more deeply understand the specific regularity of the individual languages after knowing several generalities.展开更多
Because of its peculiar language environment with multi ethnicities and the One Belt and One Road initiative,the studies of place names in Ili Kazak Autonomous Prefecture attract many linguists’interests.This paper f...Because of its peculiar language environment with multi ethnicities and the One Belt and One Road initiative,the studies of place names in Ili Kazak Autonomous Prefecture attract many linguists’interests.This paper focuses on characteristics studies of Uyghur place names in Ili Kazak Autonomous Prefecture based on Universal Principles of Tendencies proposed by the Polish linguist Witold Manczak.Through careful and thorough study,this paper found that because of different traditional lifestyles and different political strategies through different historical periods,as urban people,the Uyghur place names highlighted the Uyghur people’s great contributions to artificial constructions.展开更多
This paper focuses on characteristics studies of Kazak place names in Ili Kazak Autonomous Prefecture based on Univer-sal Principles of Tendencies proposed by the Polish linguist Witold Manczak.Through careful and tho...This paper focuses on characteristics studies of Kazak place names in Ili Kazak Autonomous Prefecture based on Univer-sal Principles of Tendencies proposed by the Polish linguist Witold Manczak.Through careful and thorough study,this paper foundthat because of different traditional lifestyles and different political strategies through different historical periods,this paper provedthat the Kazaks as the nomads were proficient in observing the environmental conditions,and made full use of them to name places.展开更多
The translation of road names from Chinese into English applies no specific standards up to date. Whether the roads should be re ferred in accordance with the nearby position or some symbolic buildings or simply be re...The translation of road names from Chinese into English applies no specific standards up to date. Whether the roads should be re ferred in accordance with the nearby position or some symbolic buildings or simply be represented by a phonetic pronunciation which has no specific meaning in English has provoked some heated discussion. In this paper I employ the skopos theory and other relevant theories to put forward a new perspective of translating road and street names, with same emphasis on that of generic name and on specific name. Some illustrations will be made on the defects of the writing standard set by relevant Chinese laws and regulations as well as resolutions of the United Nations Conference on the Standardization of Geographical Names, i.e. the adoption of a Single Romanization System, and Lu to be the generic name in China's case. Also, I will further probe the existing translations of road and street names in major Chinese cities, most of which to be Road, either go against relevant laws or fail to inform foreign visitors. Finally, I will come up with the suggestion of ap plication of a combination of Lu and Road in translation of road and street names in accordance with functional approaches.展开更多
Translation of brand names is a form of intercultural communication. Whether we translate the Chinese brand names into the English ones or translate the English brand names into the Chinese ones, language laws, cultur...Translation of brand names is a form of intercultural communication. Whether we translate the Chinese brand names into the English ones or translate the English brand names into the Chinese ones, language laws, cultural psychology, aesthetic interest and some other factors will be involved. According to the principle of equivalence theory, the translated brand names should achieve a perfect linguistic unity among sound, form and meaning. Based on the translation principles, the translators should also pay attention to some forbidden zones.展开更多
This paper,from the perspective of Verschueren’s adaptation theory,explores how a translator should adapt to the properties of products,different language customs,and consumers’psychology during the translation of b...This paper,from the perspective of Verschueren’s adaptation theory,explores how a translator should adapt to the properties of products,different language customs,and consumers’psychology during the translation of brand names.First,a gen eral introduction is made on adaptation theory.Then,the application of adaptation theory in brand name translation is illustrated.Finally it is found that adaptation theory is very helpful for the translation of brand names.展开更多
基金MMU Postdoctoral and Research Fellow(Account:MMUI/230023.02).
文摘The context of recognizing handwritten city names,this research addresses the challenges posed by the manual inscription of Bangladeshi city names in the Bangla script.In today’s technology-driven era,where precise tools for reading handwritten text are essential,this study focuses on leveraging deep learning to understand the intricacies of Bangla handwriting.The existing dearth of dedicated datasets has impeded the progress of Bangla handwritten city name recognition systems,particularly in critical areas such as postal automation and document processing.Notably,no prior research has specifically targeted the unique needs of Bangla handwritten city name recognition.To bridge this gap,the study collects real-world images from diverse sources to construct a comprehensive dataset for Bangla Hand Written City name recognition.The emphasis on practical data for system training enhances accuracy.The research further conducts a comparative analysis,pitting state-of-the-art(SOTA)deep learning models,including EfficientNetB0,VGG16,ResNet50,DenseNet201,InceptionV3,and Xception,against a custom Convolutional Neural Networks(CNN)model named“Our CNN.”The results showcase the superior performance of“Our CNN,”with a test accuracy of 99.97% and an outstanding F1 score of 99.95%.These metrics underscore its potential for automating city name recognition,particularly in postal services.The study concludes by highlighting the significance of meticulous dataset curation and the promising outlook for custom CNN architectures.It encourages future research avenues,including dataset expansion,algorithm refinement,exploration of recurrent neural networks and attention mechanisms,real-world deployment of models,and extension to other regional languages and scripts.These recommendations offer exciting possibilities for advancing the field of handwritten recognition technology and hold practical implications for enhancing global postal services.
文摘Dear Jack,I'm very glad to know that you'll come to China to learn Chinese.And you want to know about Chinese names.Now,I'd like to tell you something about them.Chinese names are different from English names.In Chinese,family names always come first and given names come last Given names usually have some special meanings.We also had informal names when we were little kids,such as Congcong,Nana and so on.
基金supported by the National Natural Science Foundation of China (32030068)the Shanghai Municipal Natural Science Foundation (20ZR1418100) to J.Z.
文摘The scientific names of organisms are key identifiers of plants and animals.Correctly treating scientific names is a prerequisite for biodiversity research and documentation.Here,we present an R package,’U.Taxonstand’,which can standardize and harmonize scientific names in plant and animal species lists at a fast speed and at a high rate of matching success.Unlike most of other similar R packages each of which works with only one taxonomic database,U.Taxonstand can work with all taxonomic databases,as long as they are properly formatted.Multiple databases for plants and animals that can be directly used by U.Taxonstand,which include bryophytes,vascular plants,amphibians,birds,fishes,mammals,and reptiles,are available online.U.Taxonstand can be a very useful tool for botanists,zoologists,ecologists and biogeographers to standardize and harmonize scientific names of organisms.
文摘Guangzhou and Foshan enjoy relatively mature metro network.However,some names of metro stations are over-transliterated in Pinyin.Such a translation method is used in translating general names,nouns of locality and some names of tourist destinations.With translation landscape and linguistic landscape theories,the reasons and impacts of over-transliteration in Guangzhou and Foshan metro will be discussed from the perspective of symbolic function.English names of Metro stations in other cities serve as a reference so as to appropriate solutions.
基金This research was supported by the National Key Research and Development Program[2020YFB1006302].
文摘Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.
基金supported by the National Natural Science Foundation of China(U22A20520)the Innovation Team Project of Modern Agricultural Industrial Technology System of Guangdong Province,China(2023KJ119)the Natural Science Foundation Program of Guangdong Province,China(2023A1515012206)。
文摘Streptococcus suis serotype 2(S.suis 2)is a zoonotic pathogen that clinically causes severe swine and human infections(such as meningitis,endocarditis,and septicemia).In order to cause widespread diseases in different organs,S.suis 2 must colonize the host,break the blood barrier,and cause exaggerated inflammation.In the last few years,most studies have focused on a single virulence factor and its influences on the host.Membrane vesicles(MVs)can be actively secreted into the extracellular environment contributing to bacteria-host interactions.Gram-negative bacteria-derived outer membrane vesicles(OMVs)were recently shown to activate host Caspase-11-mediated non-canonical inflammasome pathway via deliverance of OMV-bound lipopolysaccharide(LPS),causing host cell pyroptosis.However,little is known about the effect of the MVs from S.suis 2(Gram-positive bacteria without LPS)on cell pyroptosis.Thus,we investigated the molecular mechanism by which S.suis 2 MVs participate in endothelial cell pyroptosis.In this study,we used proteomics,electron scanning microscopy,fluorescence microscope,Western blotting,and bioassays,to investigate the MVs secreted by S.suis 2.First,we demonstrated that S.suis 2 secreted MVs with an average diameter of 72.04 nm,and 200 proteins in MVs were identified.Then,we showed that MVs were transported to cells via mainly dynamin-dependent endocytosis.The S.suis 2 MVs activated NLRP3/Caspase-1/GSDMD canonical inflammasome signaling pathway,resulting in cell pyroptosis,but it did not activate the Caspase-4/-5 pathway.More importantly,endothelial cells produce large amounts of reactive oxygen species(ROS)and lost their mitochondrial membrane potential under induction by S.suis 2 MVs.The results in this study suggest for the first time that MVs from S.suis 2 were internalized by endothelial cells via mainly dynamin-dependent endocytosis and might promote NLRP3/Caspase-1/GSDMD pathway by mitochondrial damage,which produced mtDNA and ROS under induction,leading to the pyroptosis of endothelial cells.
基金supported by the Outstanding Youth Team Project of Central Universities(QNTD202308)the Ant Group through CCF-Ant Research Fund(CCF-AFSG 769498 RF20220214).
文摘Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.
文摘Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition.
文摘Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.
文摘The assortedness of Chinese food,together with the complexity of their naming elements,has ignited numerous scholars' interests in this field and prompted them to make abundant analyses of Chinese dish names.Most of them,however,were done in studies of traditional linguistics,rhetoric,translatology and cross-cultural communication.And studies,based on corpus,on the naming elements of Chinese dishes under cognitive linguistic theories almost remain a blank.This paper aims to conduct a quantitative analysis of 4,000 Chinese dish names(500 ones selected freely from each of the eight cuisines),based on the Prominence Principle,in order to identify the specific naming elements of Chinese dishes and forward related statistics and ratios.
文摘Name system has developed simultaneously with the change of world language.Generalities of Chinese names and English names are discussed from 8 aspects.Meanwhile,the structural changes of names in Chinese and English are also contrasted in order to make some further exploration.Readers may more deeply understand the specific regularity of the individual languages after knowing several generalities.
文摘Because of its peculiar language environment with multi ethnicities and the One Belt and One Road initiative,the studies of place names in Ili Kazak Autonomous Prefecture attract many linguists’interests.This paper focuses on characteristics studies of Uyghur place names in Ili Kazak Autonomous Prefecture based on Universal Principles of Tendencies proposed by the Polish linguist Witold Manczak.Through careful and thorough study,this paper found that because of different traditional lifestyles and different political strategies through different historical periods,as urban people,the Uyghur place names highlighted the Uyghur people’s great contributions to artificial constructions.
文摘This paper focuses on characteristics studies of Kazak place names in Ili Kazak Autonomous Prefecture based on Univer-sal Principles of Tendencies proposed by the Polish linguist Witold Manczak.Through careful and thorough study,this paper foundthat because of different traditional lifestyles and different political strategies through different historical periods,this paper provedthat the Kazaks as the nomads were proficient in observing the environmental conditions,and made full use of them to name places.
文摘The translation of road names from Chinese into English applies no specific standards up to date. Whether the roads should be re ferred in accordance with the nearby position or some symbolic buildings or simply be represented by a phonetic pronunciation which has no specific meaning in English has provoked some heated discussion. In this paper I employ the skopos theory and other relevant theories to put forward a new perspective of translating road and street names, with same emphasis on that of generic name and on specific name. Some illustrations will be made on the defects of the writing standard set by relevant Chinese laws and regulations as well as resolutions of the United Nations Conference on the Standardization of Geographical Names, i.e. the adoption of a Single Romanization System, and Lu to be the generic name in China's case. Also, I will further probe the existing translations of road and street names in major Chinese cities, most of which to be Road, either go against relevant laws or fail to inform foreign visitors. Finally, I will come up with the suggestion of ap plication of a combination of Lu and Road in translation of road and street names in accordance with functional approaches.
文摘Translation of brand names is a form of intercultural communication. Whether we translate the Chinese brand names into the English ones or translate the English brand names into the Chinese ones, language laws, cultural psychology, aesthetic interest and some other factors will be involved. According to the principle of equivalence theory, the translated brand names should achieve a perfect linguistic unity among sound, form and meaning. Based on the translation principles, the translators should also pay attention to some forbidden zones.
文摘This paper,from the perspective of Verschueren’s adaptation theory,explores how a translator should adapt to the properties of products,different language customs,and consumers’psychology during the translation of brand names.First,a gen eral introduction is made on adaptation theory.Then,the application of adaptation theory in brand name translation is illustrated.Finally it is found that adaptation theory is very helpful for the translation of brand names.