Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic m...Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. How- ever, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.展开更多
In the procedure of the steady-state hierarchical optimization with feedback for large-scale industrial processes, a sequence of set-point changes with different magnitudes is carried out on the optimization layer. To...In the procedure of the steady-state hierarchical optimization with feedback for large-scale industrial processes, a sequence of set-point changes with different magnitudes is carried out on the optimization layer. To improve the dynamic performance of transient response driven by the set-point changes, a filter-based iterative learning control strategy is proposed. In the proposed updating law, a local-symmetric-integral operator is adopted for eliminating the measurement noise of output information,a set of desired trajectories are specified according to the set-point changes sequence, the current control input is iteratively achieved by utilizing smoothed output error to modify its control input at previous iteration, to which the amplified coefficients related to the different magnitudes of set-point changes are introduced. The convergence of the algorithm is conducted by incorporating frequency-domain technique into time-domain analysis. Numerical simulation demonstrates the effectiveness of the proposed strategy,展开更多
The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classificatio...The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir...Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.展开更多
A method of fast data processing has been developed to rapidly obtain evolution of the electron density profile for a multichannel polarimeter-interferometer system(POLARIS)on J-TEXT. Compared with the Abel inversio...A method of fast data processing has been developed to rapidly obtain evolution of the electron density profile for a multichannel polarimeter-interferometer system(POLARIS)on J-TEXT. Compared with the Abel inversion method, evolution of the density profile analyzed by this method can quickly offer important information. This method has the advantage of fast calculation speed with the order of ten milliseconds per normal shot and it is capable of processing up to 1 MHz sampled data, which is helpful for studying density sawtooth instability and the disruption between shots. In the duration of a flat-top plasma current of usual ohmic discharges on J-TEXT, shape factor u is ranged from 4 to 5. When the disruption of discharge happens, the density profile becomes peaked and the shape factor u typically decreases to 1.展开更多
A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an e...A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an efficient neural network model,and verifying the performance of a model needs excessive resources.Previous research studies have demonstrated that many existing models can be regarded as different numerical discretizations of differential equations.This connection sheds light on designing an effective recurrent neural network(RNN)by resorting to numerical analysis.Simple RNN is regarded as a discretisation of the forward Euler scheme.Considering the limited solution accuracy of the forward Euler methods,a Taylor‐type discrete scheme is presented with lower truncation error and a Taylor‐type RNN(T‐RNN)is designed with its guidance.Extensive experiments are conducted to evaluate its performance on statistical language models and emotion analysis tasks.The noticeable gains obtained by T‐RNN present its superiority and the feasibility of designing the neural network model using numerical methods.展开更多
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ...One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.展开更多
The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of ...The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of studies have been conducted to improve the performance of the key-value store while maintaining its flexibility.However,the research efforts storing the large-scale values such as multimedia data files(e.g.,images or videos)in the key-value store were limited.In this study,we propose a new key-value store,WR-Store++aiming to store the large-scale values stably.Specifically,it provides a new design of separating data and index by working with the built-in data structure of the Windows operating system and the file system.The utilization of the built-in data structure of the Windows operating system achieves the efficiency of the key-value store and that of the file system extends the limited space of the storage significantly.We also present chunk-based memory management and parallel processing of WR-Store++to further improve its performance in the GET operation.Through the experiments,we show that WR-Store++can store at least 32.74 times larger datasets than the existing baseline key-value store,WR-Store,which has the limitation in storing large-scale data sets.Furthermore,in terms of processing efficiency,we show that WR-Store++outperforms not only WR-Store but also the other state-ofthe-art key-value stores,LevelDB,RocksDB,and BerkeleyDB,for individual key-value operations and mixed workloads.展开更多
The recent advancements made in World Wide Web and social networking have eased the spread of fake news among people at a faster rate.At most of the times,the intention of fake news is to misinform the people and make...The recent advancements made in World Wide Web and social networking have eased the spread of fake news among people at a faster rate.At most of the times,the intention of fake news is to misinform the people and make manipulated societal insights.The spread of low-quality news in social networking sites has a negative influence upon people as well as the society.In order to overcome the ever-increasing dissemination of fake news,automated detection models are developed using Artificial Intelligence(AI)and Machine Learning(ML)methods.The latest advancements in Deep Learning(DL)models and complex Natural Language Processing(NLP)tasks make the former,a significant solution to achieve Fake News Detection(FND).In this background,the current study focuses on design and development of Natural Language Processing with Sea Turtle Foraging Optimizationbased Deep Learning Technique for Fake News Detection and Classification(STODL-FNDC)model.The aim of the proposed STODL-FNDC model is to discriminate fake news from legitimate news in an effectual manner.In the proposed STODL-FNDC model,the input data primarily undergoes pre-processing and Glove-based word embedding.Besides,STODL-FNDC model employs Deep Belief Network(DBN)approach for detection as well as classification of fake news.Finally,STO algorithm is utilized after adjusting the hyperparameters involved in DBN model,in an optimal manner.The novelty of the study lies in the design of STO algorithm with DBN model for FND.In order to improve the detection performance of STODL-FNDC technique,a series of simulations was carried out on benchmark datasets.The experimental outcomes established the better performance of STODL-FNDC approach over other methods with a maximum accuracy of 95.50%.展开更多
Large-scale magnetic structures are the main carrier of major eruptions in the solar atmosphere. These structures are rooted in the photosphere and are driven by the unceasing motion of the photospheric material throu...Large-scale magnetic structures are the main carrier of major eruptions in the solar atmosphere. These structures are rooted in the photosphere and are driven by the unceasing motion of the photospheric material through a series of equilibrium configurations. The motion brings energy into the coronal magnetic field until the system ceases to be in equilibrium. The catastrophe theory for solar eruptions indicates that loss of mechanical equilibrium constitutes the main trigger mechanism of major eruptions, usually shown up as solar flares, eruptive prominences, and coronal mass ejections (CMEs). Magnetic reconnection which takes place at the very beginning of the eruption as a result of plasma instabilities/turbulence inside the current sheet, converts magnetic energy into heating and kinetic energy that are responsible for solar flares, and for accelerating both plasma ejecta (flows and CMEs) and energetic particles. Various manifestations are thus related to one another, and the physics behind these relationships is catastrophe and magnetic reconnection. This work reports on recent progress in both theoretical research and observations on eruptive phenomena showing the above manifestations. We start by displaying the properties of large-scale structures in the corona and the related magnetic fields prior to an eruption, and show various morphological features of the disrupting magnetic fields. Then, in the framework of the catastrophe theory, we look into the physics behind those features investigated in a succession of previous works, and discuss the approaches they used.展开更多
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho...Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
近年来场景文本检测技术飞速发展,提出一种可适用于任意形状文本检测的新颖算法Mask Text Detector.该算法在Mask R-CNN的基础上,用anchor-free的方法替代了原本的RPN层生成建议框,减少了超参、模型参数和计算量.还提出LQCS(Localizatio...近年来场景文本检测技术飞速发展,提出一种可适用于任意形状文本检测的新颖算法Mask Text Detector.该算法在Mask R-CNN的基础上,用anchor-free的方法替代了原本的RPN层生成建议框,减少了超参、模型参数和计算量.还提出LQCS(Localization Quality and Classification Score)joint regression,能够将坐标质量和类别分数关联到一起,消除预测阶段不一致的问题.为了让网络区分复杂样本,结合传统的边缘检测算法提出Socle-Mask分支生成分割掩码.该模块在水平和垂直方向上分区别提取纹理特征,并加入通道自注意力机制,让网络自主选择通道特征.我们在三个具有挑战性的数据集(Total-Text、CTW1500和ICDAR2015)中进行了广泛的实验,验证了该算法具有很好的文本检测性能.展开更多
This paper discusses the relationship between commanding those basic information contained in a text and the final purpose of comprehending in a text-reading process.By using the main topic and the central meaning tha...This paper discusses the relationship between commanding those basic information contained in a text and the final purpose of comprehending in a text-reading process.By using the main topic and the central meaning that all texts have as two main examples,the author mainly illustrates what a reader should pay attention to in reading a text.展开更多
Two types of persistent heavy rainfall events (PHREs) over the Yangtze River-Huaihe River Basin were determined in a recent statistical study: type A, whose precipitation is mainly located to the south of the Yangt...Two types of persistent heavy rainfall events (PHREs) over the Yangtze River-Huaihe River Basin were determined in a recent statistical study: type A, whose precipitation is mainly located to the south of the Yangtze River; and type B, whose precipitation is mainly located to the north of the river. The present study investigated these two PHRE types using a newly derived set of energy equations to show the scale interaction and main energy paths contributing to the persistence of the precipitation. The main results were as follows. The available potential energy (APE) and kinetic energy (KE) associated with both PHRE types generally increased upward in the troposphere, with the energy of the type-A PHREs stronger than that of the type-B PHREs (except for in the middle troposphere). There were two main common and universal energy paths of the two PHRE types: (1) the baroclinic energy conversion from APE to KE was the dominant energy source for the evolution of large-scale background circulations; and (2) the downscaled energy cascade processes of KE and APE were vital for sustaining the eddy flow, which directly caused the PHREs. The significant differences between the two PHRE types mainly appeared in the lower troposphere, where the baroclinic energy conversion associated with the eddy flow in type-A PHREs was from KE to APE, which reduced the intensity of the precipitation-related eddy flow; whereas, the conversion in type-B PHREs was from APE to KE, which enhanced the eddy flow.展开更多
Perovskite solar cells(PSCs) have emerged as one of the most promising candidates for photovoltaic applications. Low-cost, low-temperature solution processes including coating and printing techniques makes PSCs promis...Perovskite solar cells(PSCs) have emerged as one of the most promising candidates for photovoltaic applications. Low-cost, low-temperature solution processes including coating and printing techniques makes PSCs promising for the greatly potential commercialization due to the scalability and compatibility with large-scale, roll-to-roll manufacturing processes. In this review, we focus on the solution deposition of charge transport layers and perovskite absorption layer in both mesoporous and planar structural PSC devices. Furthermore, the most recent design strategies via solution deposition are presented as well, which have been explored to enlarge the active area, enhance the crystallization and passivate the defects, leading to the performance improvement of PSC devices.展开更多
基金This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61572226 and 61876069, and the Key Scientific and Technological Research and Development Project of Jilin Province of China under Grant Nos. 20180201067GX and 20180201044GX.
文摘Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. How- ever, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.
基金This work was supported by the National Natural Science Foundation of China (No. 60274055)
文摘In the procedure of the steady-state hierarchical optimization with feedback for large-scale industrial processes, a sequence of set-point changes with different magnitudes is carried out on the optimization layer. To improve the dynamic performance of transient response driven by the set-point changes, a filter-based iterative learning control strategy is proposed. In the proposed updating law, a local-symmetric-integral operator is adopted for eliminating the measurement noise of output information,a set of desired trajectories are specified according to the set-point changes sequence, the current control input is iteratively achieved by utilizing smoothed output error to modify its control input at previous iteration, to which the amplified coefficients related to the different magnitudes of set-point changes are introduced. The convergence of the algorithm is conducted by incorporating frequency-domain technique into time-domain analysis. Numerical simulation demonstrates the effectiveness of the proposed strategy,
基金funded by the Informatization Plan of Chinese Academy of Sciences(Grant No.CASWX2021SF-0102)the National Key R&D Program of China(Grant Nos.2022YFA1603903,2022YFA1403800,and 2021YFA0718700)+1 种基金the National Natural Science Foundation of China(Grant Nos.11925408,11921004,and 12188101)the Chinese Academy of Sciences(Grant No.XDB33000000)。
文摘The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.
基金the funding support from the National Natural Science Foundation of China (No. 81874429)Digital and Applied Research Platform for Diagnosis of Traditional Chinese Medicine (No. 49021003005)+1 种基金2018 Hunan Provincial Postgraduate Research Innovation Project (No. CX2018B465)Excellent Youth Project of Hunan Education Department in 2018 (No. 18B241)
文摘Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.
基金supported by the National Magnetic Confinement Fusion Science Program of China(Nos.2014GB106000,2014GB106002,and2014GB106003)National Natural Science Foundation of China(Nos.11275234,11375237 and 11505238)Scientific Research Grant of Hefei Science Center of CAS(No.2015SRG-HSC010)
文摘A method of fast data processing has been developed to rapidly obtain evolution of the electron density profile for a multichannel polarimeter-interferometer system(POLARIS)on J-TEXT. Compared with the Abel inversion method, evolution of the density profile analyzed by this method can quickly offer important information. This method has the advantage of fast calculation speed with the order of ten milliseconds per normal shot and it is capable of processing up to 1 MHz sampled data, which is helpful for studying density sawtooth instability and the disruption between shots. In the duration of a flat-top plasma current of usual ohmic discharges on J-TEXT, shape factor u is ranged from 4 to 5. When the disruption of discharge happens, the density profile becomes peaked and the shape factor u typically decreases to 1.
基金supported in part by the National Natural Science Foundation of China under Grant 62176109in part by the Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province under Grant 2021‐Z‐003+3 种基金in part by the Natural Science Foundation of Gansu Province under Grant 21JR7RA531 and Grant 22JR5RA487in part by the Fundamental Research Funds for the Central Universities under Grant lzujbky‐2022‐23in part by the CAAI‐Huawei MindSpore Open Fund under Grant CAAIXSJLJJ‐2022‐020Ain part by the Supercomputing Center of Lanzhou University,in part by Sichuan Science and Technology Program No.2022nsfsc0916.
文摘A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an efficient neural network model,and verifying the performance of a model needs excessive resources.Previous research studies have demonstrated that many existing models can be regarded as different numerical discretizations of differential equations.This connection sheds light on designing an effective recurrent neural network(RNN)by resorting to numerical analysis.Simple RNN is regarded as a discretisation of the forward Euler scheme.Considering the limited solution accuracy of the forward Euler methods,a Taylor‐type discrete scheme is presented with lower truncation error and a Taylor‐type RNN(T‐RNN)is designed with its guidance.Extensive experiments are conducted to evaluate its performance on statistical language models and emotion analysis tasks.The noticeable gains obtained by T‐RNN present its superiority and the feasibility of designing the neural network model using numerical methods.
文摘One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.
文摘The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of studies have been conducted to improve the performance of the key-value store while maintaining its flexibility.However,the research efforts storing the large-scale values such as multimedia data files(e.g.,images or videos)in the key-value store were limited.In this study,we propose a new key-value store,WR-Store++aiming to store the large-scale values stably.Specifically,it provides a new design of separating data and index by working with the built-in data structure of the Windows operating system and the file system.The utilization of the built-in data structure of the Windows operating system achieves the efficiency of the key-value store and that of the file system extends the limited space of the storage significantly.We also present chunk-based memory management and parallel processing of WR-Store++to further improve its performance in the GET operation.Through the experiments,we show that WR-Store++can store at least 32.74 times larger datasets than the existing baseline key-value store,WR-Store,which has the limitation in storing large-scale data sets.Furthermore,in terms of processing efficiency,we show that WR-Store++outperforms not only WR-Store but also the other state-ofthe-art key-value stores,LevelDB,RocksDB,and BerkeleyDB,for individual key-value operations and mixed workloads.
文摘The recent advancements made in World Wide Web and social networking have eased the spread of fake news among people at a faster rate.At most of the times,the intention of fake news is to misinform the people and make manipulated societal insights.The spread of low-quality news in social networking sites has a negative influence upon people as well as the society.In order to overcome the ever-increasing dissemination of fake news,automated detection models are developed using Artificial Intelligence(AI)and Machine Learning(ML)methods.The latest advancements in Deep Learning(DL)models and complex Natural Language Processing(NLP)tasks make the former,a significant solution to achieve Fake News Detection(FND).In this background,the current study focuses on design and development of Natural Language Processing with Sea Turtle Foraging Optimizationbased Deep Learning Technique for Fake News Detection and Classification(STODL-FNDC)model.The aim of the proposed STODL-FNDC model is to discriminate fake news from legitimate news in an effectual manner.In the proposed STODL-FNDC model,the input data primarily undergoes pre-processing and Glove-based word embedding.Besides,STODL-FNDC model employs Deep Belief Network(DBN)approach for detection as well as classification of fake news.Finally,STO algorithm is utilized after adjusting the hyperparameters involved in DBN model,in an optimal manner.The novelty of the study lies in the design of STO algorithm with DBN model for FND.In order to improve the detection performance of STODL-FNDC technique,a series of simulations was carried out on benchmark datasets.The experimental outcomes established the better performance of STODL-FNDC approach over other methods with a maximum accuracy of 95.50%.
基金the National Natural Science Foundation of China.
文摘Large-scale magnetic structures are the main carrier of major eruptions in the solar atmosphere. These structures are rooted in the photosphere and are driven by the unceasing motion of the photospheric material through a series of equilibrium configurations. The motion brings energy into the coronal magnetic field until the system ceases to be in equilibrium. The catastrophe theory for solar eruptions indicates that loss of mechanical equilibrium constitutes the main trigger mechanism of major eruptions, usually shown up as solar flares, eruptive prominences, and coronal mass ejections (CMEs). Magnetic reconnection which takes place at the very beginning of the eruption as a result of plasma instabilities/turbulence inside the current sheet, converts magnetic energy into heating and kinetic energy that are responsible for solar flares, and for accelerating both plasma ejecta (flows and CMEs) and energetic particles. Various manifestations are thus related to one another, and the physics behind these relationships is catastrophe and magnetic reconnection. This work reports on recent progress in both theoretical research and observations on eruptive phenomena showing the above manifestations. We start by displaying the properties of large-scale structures in the corona and the related magnetic fields prior to an eruption, and show various morphological features of the disrupting magnetic fields. Then, in the framework of the catastrophe theory, we look into the physics behind those features investigated in a succession of previous works, and discuss the approaches they used.
基金supported by the Science and Technology Department of Sichuan Province(No.2021YFG0156).
文摘Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘近年来场景文本检测技术飞速发展,提出一种可适用于任意形状文本检测的新颖算法Mask Text Detector.该算法在Mask R-CNN的基础上,用anchor-free的方法替代了原本的RPN层生成建议框,减少了超参、模型参数和计算量.还提出LQCS(Localization Quality and Classification Score)joint regression,能够将坐标质量和类别分数关联到一起,消除预测阶段不一致的问题.为了让网络区分复杂样本,结合传统的边缘检测算法提出Socle-Mask分支生成分割掩码.该模块在水平和垂直方向上分区别提取纹理特征,并加入通道自注意力机制,让网络自主选择通道特征.我们在三个具有挑战性的数据集(Total-Text、CTW1500和ICDAR2015)中进行了广泛的实验,验证了该算法具有很好的文本检测性能.
文摘This paper discusses the relationship between commanding those basic information contained in a text and the final purpose of comprehending in a text-reading process.By using the main topic and the central meaning that all texts have as two main examples,the author mainly illustrates what a reader should pay attention to in reading a text.
基金supported by the National Key Basic Research and Development Project of China(Grant No.2012CB417201)the National Natural Science Foundation of China(Grant Nos.41375053 and 41505038)
文摘Two types of persistent heavy rainfall events (PHREs) over the Yangtze River-Huaihe River Basin were determined in a recent statistical study: type A, whose precipitation is mainly located to the south of the Yangtze River; and type B, whose precipitation is mainly located to the north of the river. The present study investigated these two PHRE types using a newly derived set of energy equations to show the scale interaction and main energy paths contributing to the persistence of the precipitation. The main results were as follows. The available potential energy (APE) and kinetic energy (KE) associated with both PHRE types generally increased upward in the troposphere, with the energy of the type-A PHREs stronger than that of the type-B PHREs (except for in the middle troposphere). There were two main common and universal energy paths of the two PHRE types: (1) the baroclinic energy conversion from APE to KE was the dominant energy source for the evolution of large-scale background circulations; and (2) the downscaled energy cascade processes of KE and APE were vital for sustaining the eddy flow, which directly caused the PHREs. The significant differences between the two PHRE types mainly appeared in the lower troposphere, where the baroclinic energy conversion associated with the eddy flow in type-A PHREs was from KE to APE, which reduced the intensity of the precipitation-related eddy flow; whereas, the conversion in type-B PHREs was from APE to KE, which enhanced the eddy flow.
基金Projects(51673214,51673218,61774170)supported by the National Natural Science Foundation of ChinaProject(2017YFA0206600)supported by the National Key Research and Development Program of China。
文摘Perovskite solar cells(PSCs) have emerged as one of the most promising candidates for photovoltaic applications. Low-cost, low-temperature solution processes including coating and printing techniques makes PSCs promising for the greatly potential commercialization due to the scalability and compatibility with large-scale, roll-to-roll manufacturing processes. In this review, we focus on the solution deposition of charge transport layers and perovskite absorption layer in both mesoporous and planar structural PSC devices. Furthermore, the most recent design strategies via solution deposition are presented as well, which have been explored to enlarge the active area, enhance the crystallization and passivate the defects, leading to the performance improvement of PSC devices.