Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in ...Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in understanding the impacts of fires in the environment,in protecting communities,and addressing management challenges.This study aimed to create a database using a text mining technique for global researchers interested in WUI-projects and highlighting the interest of countries in this field.Author’s-Keywords analysis emphasized the dominance of fire science-related terms,especially related to WUI,and identified keyword clusters related to the WUI fire-risk-assessment-system-“exposure”,“danger”,and“vulnerability”within wildfire research.Trends over the past decade showcase shifting research interests with a growing focus on WUI fires,while regional variations highlighted that the“exposure”keyword cluster received greater atten-tion in the southern Europe and South America.However,vulnerability keywords have relatively a lower representation across all regions.The analysis underscores the interdisci-plinary nature of WUI research and emphasizes the need for targeted approaches to address the unique challenges of the wildland-urban interface.Overall,this study provides valu-able insights for researchers and serves as a foundation for further collaboration in this field through the understanding of the trends over recent years and in different regions.展开更多
Aiming to identify policy topics and their evolutionary logic that enhance the digital and green development(dual development)of traditional manufacturing enterprises,address weaknesses in current policies,and provide...Aiming to identify policy topics and their evolutionary logic that enhance the digital and green development(dual development)of traditional manufacturing enterprises,address weaknesses in current policies,and provide resources for refining dual development policies,a total of 15954 dual development-related policies issued by national and various departmental authorities in China from January 2000 to August 2023 were analyzed.Based on topic modeling techniques and the policy modeling consistency(PMC)framework,the evolution of policy topics was visualized,and a dynamic assessment of the policies was conducted.The results show that the digital and green development policy framework is progressively refined,and the governance philosophy shifts from a“regulatory government”paradigm to a“service-oriented government”.The support pattern evolves from“dispersed matching”to“integrated symbiosis”.However,there are still significant deficiencies in departmental cooperation,balanced measures,coordinated links,and multi-stakeholder participation.Future policy improvements should,therefore,focus on guiding multi-stakeholder participation,enhancing public demand orientation,and addressing the entire value chain.These steps aim to create an open and shared digital industry ecosystem to promote the coordinated dual development of traditional manufacturing enterprises.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for ...Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for pharmacovigilance.Methods Relevant domestic and foreign literature was used to explore text classification based on machine learning,text mining based on deep learning(neural networks)and adverse drug reaction(ADR)terminology.Results and Conclusion Text classification based on traditional machine learning mainly include support vector machine(SVM)algorithm,naive Bayesian(NB)classifier,decision tree,hidden Markov model(HMM)and bidirectional en-coder representations from transformers(BERT).The main neural network text mining based on deep learning are convolution neural network(CNN),recurrent neural network(RNN)and long short-term memory(LSTM).ADR terminology standardization tools mainly include“Medical Dictionary for Regulatory Activities”(MedDRA),“WHODrug”and“Systematized Nomenclature of Medicine-Clinical Terms”(SNOMED CT).展开更多
This study aimed to explore citizens’emotional responses and issues of interest in the context of the coronavirus disease 2019(COVID-19)pandemic.The dataset comprised 65,313 tweets with the location marked as New Yor...This study aimed to explore citizens’emotional responses and issues of interest in the context of the coronavirus disease 2019(COVID-19)pandemic.The dataset comprised 65,313 tweets with the location marked as New York State.The data collection period was four days of tweets when New York City imposed a lockdown order due to an increase in confirmed cases.Data analysis was performed using R Studio.The emotional responses in tweets were analyzed using the Bing and NRC(National Research Council Canada)dictionaries.The tweets’central issue was identified by Text Network Analysis.When tweets were classified as either positive or negative,the negative sentiment was higher.Using the NRC dictionary,eight emotional classifications were devised:“trust,”“fear,”“anticipation,”“sadness,”“anger,”“joy,”“surprise,”and“disgust.”These results indicated that citizens showed negative and trusting emotional reactions in the early days of the pandemic.Moreover,citizens showed a strong interest in overcoming and coping with other people such as social solidarity.Citizens were concerned about the confirmation of COVID-19 infection status and death.Efforts should be made to ensure citizens’psychological stability by promptly informing them of the status of infectious disease management and the route of infection.展开更多
Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,wa...Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.展开更多
Text mining has emerged as an effective method of handling and extracting useful information from the exponentially growing biomedical literature and biomedical databases.We developed a novel biomedical text mining mo...Text mining has emerged as an effective method of handling and extracting useful information from the exponentially growing biomedical literature and biomedical databases.We developed a novel biomedical text mining model implemented by a multi-agent system and distributed computing mechanism.Our distributed system,TextMed,comprises of several software agents,where each agent uses a reinforcement learning method to update the sentiment of relevant text from a particular set of research articles related to specific keywords.TextMed can also operate on different physical machines to expedite its knowledge extraction by utilizing a clustering technique.We collected the biomedical textual data from PubMed and then assigned to a multi-agent biomedical text mining system,where each agent directly communicates with each other collaboratively to determine the relevant information inside the textual data.Our experimental results indicate that TexMed parallels and distributes the learning process into individual agents and appropriately learn the sentiment score of specific keywords,and efficiently find connections in biomedical information through text mining paradigm.展开更多
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir...Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.展开更多
COVID-19 disease is spreading exponentially due to the rapid transmission of the virus between humans.Different countries have tried different solutions to control the spread of the disease,including lockdowns of coun...COVID-19 disease is spreading exponentially due to the rapid transmission of the virus between humans.Different countries have tried different solutions to control the spread of the disease,including lockdowns of countries or cities,quarantines,isolation,sanitization,and masks.Patients with symptoms of COVID-19 are tested using medical testing kits;these tests must be conducted by healthcare professionals.However,the testing process is expensive and time-consuming.There is no surveillance system that can be used as surveillance framework to identify regions of infected individuals and determine the rate of spread so that precautions can be taken.This paper introduces a novel technique based on deep learning(DL)that can be used as a surveillance system to identify infected individuals by analyzing tweets related to COVID-19.The system is used only for surveillance purposes to identify regions where the spread of COVID-19 is high;clinical tests should then be used to test and identify infected individuals.The system proposed here uses recurrent neural networks(RNN)and word-embedding techniques to analyze tweets and determine whether a tweet provides information about COVID-19 or refers to individuals who have been infected with the virus.The results demonstrate that RNN can conduct this analysis more accurately than other machine learning(ML)algorithms.展开更多
Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text...Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.展开更多
With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information....With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this in-formation using natural language processing and data-mining techniques.展开更多
The study use crawler to get 842,917 hot tweets written in English with keyword Chinese or China. Topic modeling and sentiment analysis are used to explore the tweets. Thirty topics are extracted. Overall, 33% of the ...The study use crawler to get 842,917 hot tweets written in English with keyword Chinese or China. Topic modeling and sentiment analysis are used to explore the tweets. Thirty topics are extracted. Overall, 33% of the tweets relate to politics, and 20% relate to economy, 21% relate to culture, and 26% relate to society. Regarding the polarity, 55% of the tweets are positive, 31% are negative and the other 14% are neutral. There are only 25.3% of the tweets with obvious sentiment, most of them are joy.展开更多
<strong>Aim: </strong>To clarify transformation of the participants’ consciousness for rebuilding the community and its factors from the discussion contents by actions for male elderly people in Town A in...<strong>Aim: </strong>To clarify transformation of the participants’ consciousness for rebuilding the community and its factors from the discussion contents by actions for male elderly people in Town A in Fukushima prefecture. <strong>Design: </strong>This study was an action research. <strong>Method: </strong>The author verbalized discussion contents of the action conducted in 2018-2019 and analyzed them for each year by the text mining method. <strong>Results: </strong>The word appearance frequency was high in the order of “Person” and “Town A” in both years. One large word network was formed in 2018 and its topic was about what the participants feel in their life in Town A. Two large word networks were formed in 2019 and their topic was about the community participation including difficulty in motivating others such as how people who do not participate can feel like joining it.展开更多
Objective:Polycystic ovarian syndrome(PCOS)is a common endocrine disorder affecting women of reproductive age.This study aimed to use text mining and microarray data analysis to identify drugs that target genes and po...Objective:Polycystic ovarian syndrome(PCOS)is a common endocrine disorder affecting women of reproductive age.This study aimed to use text mining and microarray data analysis to identify drugs that target genes and potential pathways associated with PCOS.Methods:We extracted a common set of genes associated with PCOS using text mining and the microarray dataset GSE48301.Next,we performed Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses of these genes,as well as protein–protein interaction(PPI)network analysis.Additionally,we used MCODE and cytoHubba to cluster significant common genes in the PPI network and performed gene–drug interaction analyses to identify potential drugs for further investigation.Finally,we annotated pathways associated with the genes identified.Results:Text mining and microarray analysis yielded 696 text mining genes(TMGs)and 2804 differentially expressed genes(DEGs).Among these,a set of 77 genes was found in both TMGs and DEGs.Interestingly,67 of these genes participated in constructing the PPI network.Seven common hub genes were selected using the MCODE and CytoHubba methods.Finally,five out of seven genes were targeted by 15 existing drugs.Conclusion:Four genes(FASLG,IL13,IL17A,andIL2RA),which are mainly related to the cytokine–cytokine receptor interaction pathway,could be prioritized as targets for PCOS.展开更多
In the 21st century, the surge in natural and human-induced disasters necessitates robust disaster managementframeworks. This research addresses a critical gap, exploring dynamics in the successful implementation andp...In the 21st century, the surge in natural and human-induced disasters necessitates robust disaster managementframeworks. This research addresses a critical gap, exploring dynamics in the successful implementation andperformance monitoring of disaster management. Focusing on eleven key elements like Vulnerability and RiskAssessment, Training, Disaster Preparedness, Communication, and Community Resilience, the study utilizesScopus Database for secondary data, employing Text Mining and MS-Excel for analysis and data management.IBM SPSS (26) and IBM AMOS (20) facilitate Exploratory Factor Analysis (EFA) and Structural Equation Modeling(SEM) for model evaluation.The research raises questions about crafting a comprehensive, adaptable model, understanding the interplaybetween vulnerability assessment, training, and disaster preparedness, and integrating effective communicationand collaboration. Findings offer actionable insights for policy, practice, and community resilience against disasters. By scrutinizing each factor's role and interactions, the research lays the groundwork for a flexible model.Ultimately, the study aspires to cultivate more resilient communities amid the escalating threats of an unpredictable world, fostering effective navigation and thriving.展开更多
Objectives:Polycystic ovary syndrome(PCOS)is a common endocrine disease in women of childbearing age.Although it is a leading cause of menstrual disorders,infertility,obesity,and other diseases,its molecular mechanism...Objectives:Polycystic ovary syndrome(PCOS)is a common endocrine disease in women of childbearing age.Although it is a leading cause of menstrual disorders,infertility,obesity,and other diseases,its molecular mechanism remains unclear.This study aimed to analyze the target genes,pathways,and potential drugs for PCOS through text mining.Methods:First,three different keywords("polycystic ovary syndrome","obesity/adiposis",and"anovulation")were uploaded to GenCLiP3 to obtain three different gene sets.We then chose the common genes among these gene sets.Second,we performed gene ontology and signal pathway enrichment analyses of these common genes,followed by protein-protein interaction(PPI)network analysis.Third,the most significant gene module clustered in the protein-protein network was selected to identify potential drugs for PCOS via gene-drug analysis.Results:A total of 4291 genes related to three different keywords were obtained through text mining,72 common genes were filtered among the three gene sets,and 69 genes participated in PPI network construction,of which 23 genes were clustered in the gene modules.Finally,six of the 23 genes were targeted by 30 existing drugs.Conclusions:The discovery of the six genes(CYP19A1,ESR1,IGF1R,PGR,PTGS2,and VEGFA)and 30 targeted drugs,which are associated with ovarian steroidogenesis(P<0.001),may be used in potential therapeutic strategies for PCOS.展开更多
Objective:Polycystic ovary syndrome(PCOS)is an endocrine disorder with diverse clinical manifestations that often occurs in women of childbearing age.However,its molecular pathogenesis remains unclear,and this study a...Objective:Polycystic ovary syndrome(PCOS)is an endocrine disorder with diverse clinical manifestations that often occurs in women of childbearing age.However,its molecular pathogenesis remains unclear,and this study aimed to identify miRNA targets in PCOS through text mining and database analysis.Methods:First,three different sets of text mining genes(TMGs)associated with"polycystic ovary syndrome","obesity/adiposis",and"anovulation"keywords were retrieved from the GenCLiP3 database,and overlapping genes were selected.Second,Gene ontology annotation and biological pathway enrichment analyses of these overlapping TMGs were performed,followed by protein-protein interaction(PPI)network analysis.Third,genes in the gene module clustered in the PPI were selected to predict potential miRNAs for PCOS via miRNA-mRNA analysis.Results:A total of 4291 TMGs related to three different keywords were obtained through text mining;72 intersect TMGs were retained among the three gene sets,and 62 TMGs participated in the establishment of the PPI network,of which 18 were aggregated in the gene module.Finally,11 miRNAs that simultaneously bound to two TMGs(IGF1,ESR1,MAPK1,NAMPT,PIK3CA,and SERPINE1)could be prioritized as targets to study PCOS.Conclusion(s):The discovery of 11 miRNAs(miR-301a-3p,miR-301b-3p,miR-3666,miR-454-3p,miR-130a-3p,miR-130b-3p,miR-4295,miR-190a-3p,miR-5011-5p,miR-548c-3p,and miR-4799-5p)and 6 TMGs,which are associated with the HIF-1 signaling pathway(P=4.799E-08),could be used as potential targets for PCOS.展开更多
ChatGPT has emerged as a promising advanced large language model that needs prompt to gain information.However,designing a good prompt is not an easy task for many end-users.Therefore,this study intends to determine t...ChatGPT has emerged as a promising advanced large language model that needs prompt to gain information.However,designing a good prompt is not an easy task for many end-users.Therefore,this study intends to determine the amount of information gained because of varied amounts of information in the prompt.This study used two types of prompts,initial and improved,to query the introduction sections of 327 highly cited articles on traffic safety.The queried introduction sections were then matched with the corresponding human-written introduction sections from the same articles.Similarity tests and text network analysis were used to understand the level of similarities and the content of ChatGPT-generated and human-written introductions.The findings indicate the improved prompts,which have the addition of generic persona and information about the citations and references,changed the ChatGPT's output insignificantly.While the perfect similar contents are supposed to have a 1.0 similarity score,the initial and improved prompt's introduction materials have average similarity scores of 0.5387 and 0.5567,respectively.Further,the content analysis revealed that themes such as statistics,trends,safety measures,and safety technologies are more likely to have high similarity scores,irrespective of the amount of information provided in the prompt.On the other hand,themes such as human behavior,policy and regulations,public perception,and emerging technologies require a detailed level of information in their prompt to produce materials that are close to human-written materials.The prompt engineers can use the findings to evaluate their outputs and improve their prompting skills.展开更多
Objective: To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. Methods...Objective: To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. Methods: A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identified via text mining was used as the intervention. Results: Similarities were identified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Conclusion: Commonalities between the biological activities involved in RA and DM were identified through text mining, and both RA and DM might be responsive to the same intervention at a specific stage.展开更多
Defect factors and their relevant rules can be analyzed in depth by processing defect records which are often expressed in the form of text data.However,considering that defect text consists of both structured and uns...Defect factors and their relevant rules can be analyzed in depth by processing defect records which are often expressed in the form of text data.However,considering that defect text consists of both structured and unstructured data,it is necessary to excavate structured information from unstructured data.In this paper,a text mining method based on semantic framework technology is introduced to transform unstructured defect description into structured information such as components and defect attributes.Then,a deep analyzing model of a power equipment defect is established,which provides a scheme of defect mining based on historical defect texts.Case studies prove that the proposed deep analysis method has a guiding significance for equipment upgrading,selection and maintenance.展开更多
基金The funding of this research was provided by the Portuguese Foundation for Science and Technology(FCT)in the framework of the House Refuge Project(PCIF/AGT/0109/2018).
文摘Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in understanding the impacts of fires in the environment,in protecting communities,and addressing management challenges.This study aimed to create a database using a text mining technique for global researchers interested in WUI-projects and highlighting the interest of countries in this field.Author’s-Keywords analysis emphasized the dominance of fire science-related terms,especially related to WUI,and identified keyword clusters related to the WUI fire-risk-assessment-system-“exposure”,“danger”,and“vulnerability”within wildfire research.Trends over the past decade showcase shifting research interests with a growing focus on WUI fires,while regional variations highlighted that the“exposure”keyword cluster received greater atten-tion in the southern Europe and South America.However,vulnerability keywords have relatively a lower representation across all regions.The analysis underscores the interdisci-plinary nature of WUI research and emphasizes the need for targeted approaches to address the unique challenges of the wildland-urban interface.Overall,this study provides valu-able insights for researchers and serves as a foundation for further collaboration in this field through the understanding of the trends over recent years and in different regions.
基金The National Natural Science Foundation of China(No.71973023,42277493).
文摘Aiming to identify policy topics and their evolutionary logic that enhance the digital and green development(dual development)of traditional manufacturing enterprises,address weaknesses in current policies,and provide resources for refining dual development policies,a total of 15954 dual development-related policies issued by national and various departmental authorities in China from January 2000 to August 2023 were analyzed.Based on topic modeling techniques and the policy modeling consistency(PMC)framework,the evolution of policy topics was visualized,and a dynamic assessment of the policies was conducted.The results show that the digital and green development policy framework is progressively refined,and the governance philosophy shifts from a“regulatory government”paradigm to a“service-oriented government”.The support pattern evolves from“dispersed matching”to“integrated symbiosis”.However,there are still significant deficiencies in departmental cooperation,balanced measures,coordinated links,and multi-stakeholder participation.Future policy improvements should,therefore,focus on guiding multi-stakeholder participation,enhancing public demand orientation,and addressing the entire value chain.These steps aim to create an open and shared digital industry ecosystem to promote the coordinated dual development of traditional manufacturing enterprises.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for pharmacovigilance.Methods Relevant domestic and foreign literature was used to explore text classification based on machine learning,text mining based on deep learning(neural networks)and adverse drug reaction(ADR)terminology.Results and Conclusion Text classification based on traditional machine learning mainly include support vector machine(SVM)algorithm,naive Bayesian(NB)classifier,decision tree,hidden Markov model(HMM)and bidirectional en-coder representations from transformers(BERT).The main neural network text mining based on deep learning are convolution neural network(CNN),recurrent neural network(RNN)and long short-term memory(LSTM).ADR terminology standardization tools mainly include“Medical Dictionary for Regulatory Activities”(MedDRA),“WHODrug”and“Systematized Nomenclature of Medicine-Clinical Terms”(SNOMED CT).
基金supported by the National Research Foundation of Korea(NRF)Grant Funded by the Korea Government(MSIT)(NRF-2020R1A2B5B0100208).
文摘This study aimed to explore citizens’emotional responses and issues of interest in the context of the coronavirus disease 2019(COVID-19)pandemic.The dataset comprised 65,313 tweets with the location marked as New York State.The data collection period was four days of tweets when New York City imposed a lockdown order due to an increase in confirmed cases.Data analysis was performed using R Studio.The emotional responses in tweets were analyzed using the Bing and NRC(National Research Council Canada)dictionaries.The tweets’central issue was identified by Text Network Analysis.When tweets were classified as either positive or negative,the negative sentiment was higher.Using the NRC dictionary,eight emotional classifications were devised:“trust,”“fear,”“anticipation,”“sadness,”“anger,”“joy,”“surprise,”and“disgust.”These results indicated that citizens showed negative and trusting emotional reactions in the early days of the pandemic.Moreover,citizens showed a strong interest in overcoming and coping with other people such as social solidarity.Citizens were concerned about the confirmation of COVID-19 infection status and death.Efforts should be made to ensure citizens’psychological stability by promptly informing them of the status of infectious disease management and the route of infection.
文摘Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.
基金This research is supported by Natural Science Foundation of Hunan Province(No.2019JJ40145)Scientific Research Key Project of Hunan Education Department(No.19A273)open Fund of Key Laboratory of Hunan Province(2017TP1026).
文摘Text mining has emerged as an effective method of handling and extracting useful information from the exponentially growing biomedical literature and biomedical databases.We developed a novel biomedical text mining model implemented by a multi-agent system and distributed computing mechanism.Our distributed system,TextMed,comprises of several software agents,where each agent uses a reinforcement learning method to update the sentiment of relevant text from a particular set of research articles related to specific keywords.TextMed can also operate on different physical machines to expedite its knowledge extraction by utilizing a clustering technique.We collected the biomedical textual data from PubMed and then assigned to a multi-agent biomedical text mining system,where each agent directly communicates with each other collaboratively to determine the relevant information inside the textual data.Our experimental results indicate that TexMed parallels and distributes the learning process into individual agents and appropriately learn the sentiment score of specific keywords,and efficiently find connections in biomedical information through text mining paradigm.
基金the funding support from the National Natural Science Foundation of China (No. 81874429)Digital and Applied Research Platform for Diagnosis of Traditional Chinese Medicine (No. 49021003005)+1 种基金2018 Hunan Provincial Postgraduate Research Innovation Project (No. CX2018B465)Excellent Youth Project of Hunan Education Department in 2018 (No. 18B241)
文摘Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.
基金support from Taif university through Researchers Supporting Project number(TURSP-2020/231),Taif University,Taif,Saudi Arabia.
文摘COVID-19 disease is spreading exponentially due to the rapid transmission of the virus between humans.Different countries have tried different solutions to control the spread of the disease,including lockdowns of countries or cities,quarantines,isolation,sanitization,and masks.Patients with symptoms of COVID-19 are tested using medical testing kits;these tests must be conducted by healthcare professionals.However,the testing process is expensive and time-consuming.There is no surveillance system that can be used as surveillance framework to identify regions of infected individuals and determine the rate of spread so that precautions can be taken.This paper introduces a novel technique based on deep learning(DL)that can be used as a surveillance system to identify infected individuals by analyzing tweets related to COVID-19.The system is used only for surveillance purposes to identify regions where the spread of COVID-19 is high;clinical tests should then be used to test and identify infected individuals.The system proposed here uses recurrent neural networks(RNN)and word-embedding techniques to analyze tweets and determine whether a tweet provides information about COVID-19 or refers to individuals who have been infected with the virus.The results demonstrate that RNN can conduct this analysis more accurately than other machine learning(ML)algorithms.
文摘Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.
文摘With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this in-formation using natural language processing and data-mining techniques.
文摘The study use crawler to get 842,917 hot tweets written in English with keyword Chinese or China. Topic modeling and sentiment analysis are used to explore the tweets. Thirty topics are extracted. Overall, 33% of the tweets relate to politics, and 20% relate to economy, 21% relate to culture, and 26% relate to society. Regarding the polarity, 55% of the tweets are positive, 31% are negative and the other 14% are neutral. There are only 25.3% of the tweets with obvious sentiment, most of them are joy.
文摘<strong>Aim: </strong>To clarify transformation of the participants’ consciousness for rebuilding the community and its factors from the discussion contents by actions for male elderly people in Town A in Fukushima prefecture. <strong>Design: </strong>This study was an action research. <strong>Method: </strong>The author verbalized discussion contents of the action conducted in 2018-2019 and analyzed them for each year by the text mining method. <strong>Results: </strong>The word appearance frequency was high in the order of “Person” and “Town A” in both years. One large word network was formed in 2018 and its topic was about what the participants feel in their life in Town A. Two large word networks were formed in 2019 and their topic was about the community participation including difficulty in motivating others such as how people who do not participate can feel like joining it.
文摘Objective:Polycystic ovarian syndrome(PCOS)is a common endocrine disorder affecting women of reproductive age.This study aimed to use text mining and microarray data analysis to identify drugs that target genes and potential pathways associated with PCOS.Methods:We extracted a common set of genes associated with PCOS using text mining and the microarray dataset GSE48301.Next,we performed Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses of these genes,as well as protein–protein interaction(PPI)network analysis.Additionally,we used MCODE and cytoHubba to cluster significant common genes in the PPI network and performed gene–drug interaction analyses to identify potential drugs for further investigation.Finally,we annotated pathways associated with the genes identified.Results:Text mining and microarray analysis yielded 696 text mining genes(TMGs)and 2804 differentially expressed genes(DEGs).Among these,a set of 77 genes was found in both TMGs and DEGs.Interestingly,67 of these genes participated in constructing the PPI network.Seven common hub genes were selected using the MCODE and CytoHubba methods.Finally,five out of seven genes were targeted by 15 existing drugs.Conclusion:Four genes(FASLG,IL13,IL17A,andIL2RA),which are mainly related to the cytokine–cytokine receptor interaction pathway,could be prioritized as targets for PCOS.
文摘In the 21st century, the surge in natural and human-induced disasters necessitates robust disaster managementframeworks. This research addresses a critical gap, exploring dynamics in the successful implementation andperformance monitoring of disaster management. Focusing on eleven key elements like Vulnerability and RiskAssessment, Training, Disaster Preparedness, Communication, and Community Resilience, the study utilizesScopus Database for secondary data, employing Text Mining and MS-Excel for analysis and data management.IBM SPSS (26) and IBM AMOS (20) facilitate Exploratory Factor Analysis (EFA) and Structural Equation Modeling(SEM) for model evaluation.The research raises questions about crafting a comprehensive, adaptable model, understanding the interplaybetween vulnerability assessment, training, and disaster preparedness, and integrating effective communicationand collaboration. Findings offer actionable insights for policy, practice, and community resilience against disasters. By scrutinizing each factor's role and interactions, the research lays the groundwork for a flexible model.Ultimately, the study aspires to cultivate more resilient communities amid the escalating threats of an unpredictable world, fostering effective navigation and thriving.
基金supported by the self-funded Scientific Research Project of Jiuquan City People’s Hospital,Gansu,China.
文摘Objectives:Polycystic ovary syndrome(PCOS)is a common endocrine disease in women of childbearing age.Although it is a leading cause of menstrual disorders,infertility,obesity,and other diseases,its molecular mechanism remains unclear.This study aimed to analyze the target genes,pathways,and potential drugs for PCOS through text mining.Methods:First,three different keywords("polycystic ovary syndrome","obesity/adiposis",and"anovulation")were uploaded to GenCLiP3 to obtain three different gene sets.We then chose the common genes among these gene sets.Second,we performed gene ontology and signal pathway enrichment analyses of these common genes,followed by protein-protein interaction(PPI)network analysis.Third,the most significant gene module clustered in the protein-protein network was selected to identify potential drugs for PCOS via gene-drug analysis.Results:A total of 4291 genes related to three different keywords were obtained through text mining,72 common genes were filtered among the three gene sets,and 69 genes participated in PPI network construction,of which 23 genes were clustered in the gene modules.Finally,six of the 23 genes were targeted by 30 existing drugs.Conclusions:The discovery of the six genes(CYP19A1,ESR1,IGF1R,PGR,PTGS2,and VEGFA)and 30 targeted drugs,which are associated with ovarian steroidogenesis(P<0.001),may be used in potential therapeutic strategies for PCOS.
基金supported by the self-funded Scientific Research Project of People’s Hospital of Jiuquan City,Gansu,China and the self-funded Scientific Research Project of Xiamen Health and Medical Big Data Center,Xiamen,China.
文摘Objective:Polycystic ovary syndrome(PCOS)is an endocrine disorder with diverse clinical manifestations that often occurs in women of childbearing age.However,its molecular pathogenesis remains unclear,and this study aimed to identify miRNA targets in PCOS through text mining and database analysis.Methods:First,three different sets of text mining genes(TMGs)associated with"polycystic ovary syndrome","obesity/adiposis",and"anovulation"keywords were retrieved from the GenCLiP3 database,and overlapping genes were selected.Second,Gene ontology annotation and biological pathway enrichment analyses of these overlapping TMGs were performed,followed by protein-protein interaction(PPI)network analysis.Third,genes in the gene module clustered in the PPI were selected to predict potential miRNAs for PCOS via miRNA-mRNA analysis.Results:A total of 4291 TMGs related to three different keywords were obtained through text mining;72 intersect TMGs were retained among the three gene sets,and 62 TMGs participated in the establishment of the PPI network,of which 18 were aggregated in the gene module.Finally,11 miRNAs that simultaneously bound to two TMGs(IGF1,ESR1,MAPK1,NAMPT,PIK3CA,and SERPINE1)could be prioritized as targets to study PCOS.Conclusion(s):The discovery of 11 miRNAs(miR-301a-3p,miR-301b-3p,miR-3666,miR-454-3p,miR-130a-3p,miR-130b-3p,miR-4295,miR-190a-3p,miR-5011-5p,miR-548c-3p,and miR-4799-5p)and 6 TMGs,which are associated with the HIF-1 signaling pathway(P=4.799E-08),could be used as potential targets for PCOS.
文摘ChatGPT has emerged as a promising advanced large language model that needs prompt to gain information.However,designing a good prompt is not an easy task for many end-users.Therefore,this study intends to determine the amount of information gained because of varied amounts of information in the prompt.This study used two types of prompts,initial and improved,to query the introduction sections of 327 highly cited articles on traffic safety.The queried introduction sections were then matched with the corresponding human-written introduction sections from the same articles.Similarity tests and text network analysis were used to understand the level of similarities and the content of ChatGPT-generated and human-written introductions.The findings indicate the improved prompts,which have the addition of generic persona and information about the citations and references,changed the ChatGPT's output insignificantly.While the perfect similar contents are supposed to have a 1.0 similarity score,the initial and improved prompt's introduction materials have average similarity scores of 0.5387 and 0.5567,respectively.Further,the content analysis revealed that themes such as statistics,trends,safety measures,and safety technologies are more likely to have high similarity scores,irrespective of the amount of information provided in the prompt.On the other hand,themes such as human behavior,policy and regulations,public perception,and emerging technologies require a detailed level of information in their prompt to produce materials that are close to human-written materials.The prompt engineers can use the findings to evaluate their outputs and improve their prompting skills.
基金Supported by the National Natural Science Foundation of China(No.81573845,81473367,81403209,30825047)China Academy of Chinese Medical Sciences Project(No.Z0412)Beijing Nova Program(No.xx2014B073)
文摘Objective: To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. Methods: A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identified via text mining was used as the intervention. Results: Similarities were identified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Conclusion: Commonalities between the biological activities involved in RA and DM were identified through text mining, and both RA and DM might be responsive to the same intervention at a specific stage.
文摘Defect factors and their relevant rules can be analyzed in depth by processing defect records which are often expressed in the form of text data.However,considering that defect text consists of both structured and unstructured data,it is necessary to excavate structured information from unstructured data.In this paper,a text mining method based on semantic framework technology is introduced to transform unstructured defect description into structured information such as components and defect attributes.Then,a deep analyzing model of a power equipment defect is established,which provides a scheme of defect mining based on historical defect texts.Case studies prove that the proposed deep analysis method has a guiding significance for equipment upgrading,selection and maintenance.