To cultivate new professional farmers is a key way for rural labor development, resolving existing problems such as how to farming. It is notable that government and market take advantages in training of new professio...To cultivate new professional farmers is a key way for rural labor development, resolving existing problems such as how to farming. It is notable that government and market take advantages in training of new professional farmers. Therefore, it is necessary to guarantee government and market playing the roles. The research explored market-oriented farmer training model and the characteristics and investigated training routes for new professional farmers.展开更多
The impact of environmental regulation on technology innovation is a hot spot in current research where a large number of empirical studies are based on Porter Hypothesis(PH). However, there are still controversies in...The impact of environmental regulation on technology innovation is a hot spot in current research where a large number of empirical studies are based on Porter Hypothesis(PH). However, there are still controversies in academia about the establishment of "weak" and "narrow" versions of PH. Based on the panel data of application for patent of energy conservation and emission reduction(ECER) technology of Chinese city scale during 2008-2014, comprehensive energy price, pollutant emission, etc., mixed regression model and systematic generalized method of moments method were adopted, respectively,to study the impact of market-oriented and command-and-control policy tool on China's ECER technology innovation. The results show that the environmental regulation hindered the technological innovation in the immediate phase; however, it turned out to be positive in the first-lag phase. Hence, the establishment of "weak" PH is time-bounded. The command-and-control policy tool played a more positive role in promoting technological innovation in the first-lag phase than market-oriented policy tool. Therefore, "narrow" PH is not tenable. The reason is that the main participants of China's ECER technology innovation are state-owned companies and public institutions. Regionally speaking, the impact which command-and-control policy tool has on technological innovation at sight was nonsignificant in the eastern, the central, and the western regions of China whilst market-oriented policy tool had a negative effect. And market-oriented policy tool in the central region had strongest negative effect, which would diminish in the eastern region and become weakest in the western region. This was related to regional energy consumption level and the market economic vitality.展开更多
By analyzing the problems which exist currently in the accident hidden dangers management of the coal mine, this paper proposed a new kind of management method--"simulating the market", in which an operation pattern...By analyzing the problems which exist currently in the accident hidden dangers management of the coal mine, this paper proposed a new kind of management method--"simulating the market", in which an operation pattern of simulating the market to transact hidden troubles was constructed. This method introduces "Market Mechanism" into safe management, and adopts measurable value to describe the hidden dangers such as" human behavior, technique, environment, equipments etc.". It regards the hidden dangers as "the goods produced by labor" which are found out by the safety managers and the security inspectors, then sells as "commodity". By the process of disposing, counterchecking, re-selling, and redisposing. It forms a set of market-oriented closed-form management pattern of coalmine accident hidden dangers. This kind of management method changes the past traditional methods in which the wageworkers treat safety management passively, but to encourage and restrict them to participate in the check-up and improvement of the hidden dangers.展开更多
The purpose of this study is to analyze the characteristics of the spatial distribution and change trend of the marketing level of stateowned land supply so as to provide policy recommendations. Method of spatial auto...The purpose of this study is to analyze the characteristics of the spatial distribution and change trend of the marketing level of stateowned land supply so as to provide policy recommendations. Method of spatial autocorrelation analysis is employed. The results indicate that the spatial layout of the land supply marketing level is generally dispersed,but it does assemble in some specific area. The correlation between the marketing level of state-owned land supply and the economic development is not statistically significant. But their relations fluctuate obviously. The overall marketing level of state-owned land supply is increasing and spatially concentrated. The expansion rate of marketing has decreased first and then increased and stabilized now. The high-value cluster center of marketing level of state-owned land supply exists all over the country; however the existence probability of the cluster is bigger in undeveloped areas. It is concluded that spatial autocorrelation analysis is a good method to quantitatively analyze the spatial variation of marketing level of state-owned land supply in China. To grasp the spatial and temporal variations of the marketing level of state-owned land supply is also good to enhance running of the state-owned land market.展开更多
Not long ago, the Shanghai Port Machinery Co. Ltd. won the bid during international tender for the cement construction equipment for dams and factory buildings of Phase Ⅱ of the Three Gorges Project, thanks to its ex...Not long ago, the Shanghai Port Machinery Co. Ltd. won the bid during international tender for the cement construction equipment for dams and factory buildings of Phase Ⅱ of the Three Gorges Project, thanks to its excellent product quality, advanced technology content and fine enterprise image, obtaining the contract to build the world’s largest, technically difficult overhead cranes, fully displaying its tremendous strength.展开更多
This paper aims to explore the effects of market-oriented reforms on industrial technology progress.Based on a theoretical analysis,we performed an empirical study with a marketization index and panel data of high-tec...This paper aims to explore the effects of market-oriented reforms on industrial technology progress.Based on a theoretical analysis,we performed an empirical study with a marketization index and panel data of high-tech sectors in China.We found that market-oriented reforms had significantly propelled technology progress in China’s high-tech sectors,and the effects became more evident after China’s WTO entry.Market-oriented reforms induced technology progress by increasing capital allocation efficiency,R&D input,and technology diffusion.Among various aspects of market-oriented reforms,the institutional environment exerted the most significant effects,followed by the economy’s non-state sector,product market development,and factor market development;the government-market relationship index influenced technology’s progress the least.The effects are heterogeneous across sectors with different technology attributes and more significant for technology-intensive sectors.Our findings offer policy implications for China’s ongoing market-oriented reforms and policy design for technology progress in high-tech sectors.展开更多
The construction of the market-oriented ecological compensation mechanism in China is complicated system engineering. China's ecological compensation funds are mainly derived from the public finance, lacking marke...The construction of the market-oriented ecological compensation mechanism in China is complicated system engineering. China's ecological compensation funds are mainly derived from the public finance, lacking market-oriented operation. That not only increases the financial burden of the government, but also leads to the incomprehensive compensation scope. Moreover, China's ecological compensation lacks market mechanism so that it is difficult to set compensation standards and calculate offsets. This paper takes Gannan Tibetan Autonomous Prefecture as an example to analyze the market-oriented ecological compensation system of ethnic minority areas from the perspective of market economy, so as to provide a theoretical basis and a reference point for the establishment of efficient and reasonable ecological compensation mechanism and policies in ethnic minority areas and provide environmental protection for the sustainable development of economy and society of ethnic minority areas.展开更多
China's central bank cut interest rates for deposits and loans and adjusted their floating ranges on June 8.Yi Xianrong,a research fellow with the Institute of Finance and Bankingunder the Chinese Academy of Social S...China's central bank cut interest rates for deposits and loans and adjusted their floating ranges on June 8.Yi Xianrong,a research fellow with the Institute of Finance and Bankingunder the Chinese Academy of Social Sciences, shared his views on the impact of the cut with Shanghai Securities News. Edited excerpts follow:展开更多
Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurat...Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.展开更多
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom...Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.展开更多
Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming t...Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations.展开更多
Electricity pricing is the core of the power institutional reform in China, which is related to not onlyinterests redistribution of all parties, but also health and security of the entire power industry. Only byaccele...Electricity pricing is the core of the power institutional reform in China, which is related to not onlyinterests redistribution of all parties, but also health and security of the entire power industry. Only byaccelerating the reform on pricing mechanism can sound development of the power industry be promoted.展开更多
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c...Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.展开更多
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of train...With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.展开更多
Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity bou...Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition.展开更多
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La...Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.展开更多
基金Supported by Chongqing Education Science Planning Program(2013-ZJ-060)Humanities and Social Science Research Planning Program of Ministry of Education(13YJA630042)+1 种基金Humanities and Social Science Research Program of Chongqing Education Committee(14SKN03)S&T Innovation Team Construction and Planning Foundation of Yangtze Normal University(2014XJTD03)~~
文摘To cultivate new professional farmers is a key way for rural labor development, resolving existing problems such as how to farming. It is notable that government and market take advantages in training of new professional farmers. Therefore, it is necessary to guarantee government and market playing the roles. The research explored market-oriented farmer training model and the characteristics and investigated training routes for new professional farmers.
文摘The impact of environmental regulation on technology innovation is a hot spot in current research where a large number of empirical studies are based on Porter Hypothesis(PH). However, there are still controversies in academia about the establishment of "weak" and "narrow" versions of PH. Based on the panel data of application for patent of energy conservation and emission reduction(ECER) technology of Chinese city scale during 2008-2014, comprehensive energy price, pollutant emission, etc., mixed regression model and systematic generalized method of moments method were adopted, respectively,to study the impact of market-oriented and command-and-control policy tool on China's ECER technology innovation. The results show that the environmental regulation hindered the technological innovation in the immediate phase; however, it turned out to be positive in the first-lag phase. Hence, the establishment of "weak" PH is time-bounded. The command-and-control policy tool played a more positive role in promoting technological innovation in the first-lag phase than market-oriented policy tool. Therefore, "narrow" PH is not tenable. The reason is that the main participants of China's ECER technology innovation are state-owned companies and public institutions. Regionally speaking, the impact which command-and-control policy tool has on technological innovation at sight was nonsignificant in the eastern, the central, and the western regions of China whilst market-oriented policy tool had a negative effect. And market-oriented policy tool in the central region had strongest negative effect, which would diminish in the eastern region and become weakest in the western region. This was related to regional energy consumption level and the market economic vitality.
文摘By analyzing the problems which exist currently in the accident hidden dangers management of the coal mine, this paper proposed a new kind of management method--"simulating the market", in which an operation pattern of simulating the market to transact hidden troubles was constructed. This method introduces "Market Mechanism" into safe management, and adopts measurable value to describe the hidden dangers such as" human behavior, technique, environment, equipments etc.". It regards the hidden dangers as "the goods produced by labor" which are found out by the safety managers and the security inspectors, then sells as "commodity". By the process of disposing, counterchecking, re-selling, and redisposing. It forms a set of market-oriented closed-form management pattern of coalmine accident hidden dangers. This kind of management method changes the past traditional methods in which the wageworkers treat safety management passively, but to encourage and restrict them to participate in the check-up and improvement of the hidden dangers.
基金Supported by Chongqing Key Humanities and Social Sciences Base--Research Center of Rural Economics and Management of Southwest University
文摘The purpose of this study is to analyze the characteristics of the spatial distribution and change trend of the marketing level of stateowned land supply so as to provide policy recommendations. Method of spatial autocorrelation analysis is employed. The results indicate that the spatial layout of the land supply marketing level is generally dispersed,but it does assemble in some specific area. The correlation between the marketing level of state-owned land supply and the economic development is not statistically significant. But their relations fluctuate obviously. The overall marketing level of state-owned land supply is increasing and spatially concentrated. The expansion rate of marketing has decreased first and then increased and stabilized now. The high-value cluster center of marketing level of state-owned land supply exists all over the country; however the existence probability of the cluster is bigger in undeveloped areas. It is concluded that spatial autocorrelation analysis is a good method to quantitatively analyze the spatial variation of marketing level of state-owned land supply in China. To grasp the spatial and temporal variations of the marketing level of state-owned land supply is also good to enhance running of the state-owned land market.
文摘Not long ago, the Shanghai Port Machinery Co. Ltd. won the bid during international tender for the cement construction equipment for dams and factory buildings of Phase Ⅱ of the Three Gorges Project, thanks to its excellent product quality, advanced technology content and fine enterprise image, obtaining the contract to build the world’s largest, technically difficult overhead cranes, fully displaying its tremendous strength.
基金by the General Program of the National Science Foundation of China(NSFC)“Study on the Effects of Factor Price Distortion on the Technology Sophistication of Exports from High-tech Sectors and Policy Response”(Grant No.71773107).
文摘This paper aims to explore the effects of market-oriented reforms on industrial technology progress.Based on a theoretical analysis,we performed an empirical study with a marketization index and panel data of high-tech sectors in China.We found that market-oriented reforms had significantly propelled technology progress in China’s high-tech sectors,and the effects became more evident after China’s WTO entry.Market-oriented reforms induced technology progress by increasing capital allocation efficiency,R&D input,and technology diffusion.Among various aspects of market-oriented reforms,the institutional environment exerted the most significant effects,followed by the economy’s non-state sector,product market development,and factor market development;the government-market relationship index influenced technology’s progress the least.The effects are heterogeneous across sectors with different technology attributes and more significant for technology-intensive sectors.Our findings offer policy implications for China’s ongoing market-oriented reforms and policy design for technology progress in high-tech sectors.
文摘The construction of the market-oriented ecological compensation mechanism in China is complicated system engineering. China's ecological compensation funds are mainly derived from the public finance, lacking market-oriented operation. That not only increases the financial burden of the government, but also leads to the incomprehensive compensation scope. Moreover, China's ecological compensation lacks market mechanism so that it is difficult to set compensation standards and calculate offsets. This paper takes Gannan Tibetan Autonomous Prefecture as an example to analyze the market-oriented ecological compensation system of ethnic minority areas from the perspective of market economy, so as to provide a theoretical basis and a reference point for the establishment of efficient and reasonable ecological compensation mechanism and policies in ethnic minority areas and provide environmental protection for the sustainable development of economy and society of ethnic minority areas.
文摘China's central bank cut interest rates for deposits and loans and adjusted their floating ranges on June 8.Yi Xianrong,a research fellow with the Institute of Finance and Bankingunder the Chinese Academy of Social Sciences, shared his views on the impact of the cut with Shanghai Securities News. Edited excerpts follow:
基金supported by the National Key R&D Program of China(2019YFB2103202).
文摘Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.
基金This research was supported by the National Key Research and Development Program[2020YFB1006302].
文摘Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.
基金supported by the major project of the National Social Science Foundation of China“Big Data-driven Semantic Evaluation System of Science and Technology Literature”(Grant No.21&ZD329)。
文摘Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations.
文摘Electricity pricing is the core of the power institutional reform in China, which is related to not onlyinterests redistribution of all parties, but also health and security of the entire power industry. Only byaccelerating the reform on pricing mechanism can sound development of the power industry be promoted.
基金supported by the Outstanding Youth Team Project of Central Universities(QNTD202308)the Ant Group through CCF-Ant Research Fund(CCF-AFSG 769498 RF20220214).
文摘Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.
基金the Beijing Municipal Science and Technology Program(Z231100001323004)。
文摘With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.
文摘Mathematical named entity recognition(MNER)is one of the fundamental tasks in the analysis of mathematical texts.To solve the existing problems of the current neural network that has local instability,fuzzy entity boundary,and long-distance dependence between entities in Chinese mathematical entity recognition task,we propose a series of optimization processing methods and constructed an Adversarial Training and Bidirectional long shortterm memory-Selfattention Conditional random field(AT-BSAC)model.In our model,the mathematical text was vectorized by the word embedding technique,and small perturbations were added to the word vector to generate adversarial samples,while local features were extracted by Bi-directional Long Short-Term Memory(BiLSTM).The self-attentive mechanism was incorporated to extract more dependent features between entities.The experimental results demonstrated that the AT-BSAC model achieved a precision(P)of 93.88%,a recall(R)of 93.84%,and an F1-score of 93.74%,respectively,which is 8.73%higher than the F1-score of the previous Bi-directional Long Short-Term Memory Conditional Random Field(BiLSTM-CRF)model.The effectiveness of the proposed model in mathematical named entity recognition.
文摘Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.