To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their perform...Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.展开更多
This study aims to construct an automotive English corpus to comprehensively compare the differences between English automotive marketing texts and their Chinese translations.The objective is to reveal challenges and ...This study aims to construct an automotive English corpus to comprehensively compare the differences between English automotive marketing texts and their Chinese translations.The objective is to reveal challenges and opportunities in cultural and contextual translation.The research holds significant importance for understanding the impact of cross-cultural communication in the automotive market and providing more effective translation strategies for multinational automotive manufacturers.Through corpus analysis,focusing on common marketing phrases and text features,employing both quantitative and qualitative analysis methods,and examining the accuracy,naturalness,and cultural adaptability of translated texts,we delve into the similarities and differences in conveying automotive information between the two languages.The study finds that expressive and emotional expressions commonly used in English automotive contexts may encounter challenges in Chinese translations due to language and cultural differences.This necessitates the adoption of more flexible translation strategies.Additionally,Chinese translations tend to emphasize the practicality and safety of products more than their English counterparts,placing a greater emphasis on technical and functional descriptions.The primary conclusion of this research is that the translation of automotive marketing texts requires heightened cross-cultural sensitivity and an understanding of the target audience.When translating automotive advertisements and promotions,translators should consider consumer expectations and cultural values in different contexts to ensure the effectiveness and adaptability of the translation.Furthermore,the formulation of more flexible translation strategies,integrating local culture and market demands,will contribute to enhancing the image and influence of automotive brands in the international market.Through this study,we provide deeper insights for automotive manufacturers,assisting them in leveraging the power of language for successful global market penetration.展开更多
The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content bu...The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units.展开更多
The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recogni...The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recognition system that allows the isolation,the extraction,and the recognition of text in the case of documents having a textured background,a degraded aspect of colors,and of poor quality,and to synthesize it into speech.This system basically consists of three algorithms:a text localization and detection algorithm based on mathematical morphology method(MMM);a text extraction algorithm based on the gamma correction method(GCM);and an optical character recognition(OCR)algorithm for text recognition.A detailed complexity study of the different blocks of this text recognition system has been realized.Following this study,an acceleration of the GCM algorithm(AGCM)is proposed.The AGCM algorithm has reduced the complexity in the text recognition system by 70%and kept the same quality of text recognition as that of the original method.To assist visually impaired persons,a graphical interface of the entire text recognition chain has been developed,allowing the capture of images from a camera,rapid and intuitive visualization of the recognized text from this image,and text-to-speech synthesis.Our text recognition system provides an improvement of 6.8%for the recognition rate and 7.6%for the F-measure relative to GCM and AGCM algorithms.展开更多
This study aimed to explore citizens’emotional responses and issues of interest in the context of the coronavirus disease 2019(COVID-19)pandemic.The dataset comprised 65,313 tweets with the location marked as New Yor...This study aimed to explore citizens’emotional responses and issues of interest in the context of the coronavirus disease 2019(COVID-19)pandemic.The dataset comprised 65,313 tweets with the location marked as New York State.The data collection period was four days of tweets when New York City imposed a lockdown order due to an increase in confirmed cases.Data analysis was performed using R Studio.The emotional responses in tweets were analyzed using the Bing and NRC(National Research Council Canada)dictionaries.The tweets’central issue was identified by Text Network Analysis.When tweets were classified as either positive or negative,the negative sentiment was higher.Using the NRC dictionary,eight emotional classifications were devised:“trust,”“fear,”“anticipation,”“sadness,”“anger,”“joy,”“surprise,”and“disgust.”These results indicated that citizens showed negative and trusting emotional reactions in the early days of the pandemic.Moreover,citizens showed a strong interest in overcoming and coping with other people such as social solidarity.Citizens were concerned about the confirmation of COVID-19 infection status and death.Efforts should be made to ensure citizens’psychological stability by promptly informing them of the status of infectious disease management and the route of infection.展开更多
The historical and cultural districts of a city serve as important cultural heritage and tourism resources.This paper focused on four such districts in Yangzhou and performed semantic analysis on online public comment...The historical and cultural districts of a city serve as important cultural heritage and tourism resources.This paper focused on four such districts in Yangzhou and performed semantic analysis on online public comments using ROST CM6 software.According to the high frequency words,attention preference of district site elements,activities and feelings in Yangzhou historical and cultural districts were analyzed.Through the analysis of semantic network and public emotional tendency,the relationship between the protection and utilization of Yangzhou historical and cultural districts and the perception and demand of users were discussed,and some suggestions for the protection,utilization and renewal of historical and cultural districts were put forward.展开更多
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the...Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.展开更多
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spat...To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.展开更多
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ...Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.展开更多
Eco-translatology provides a new perspective and methodology for the international publicity translation of political texts.This paper applies the viewpoint and methodology of eco-translatology,focuses on the three-di...Eco-translatology provides a new perspective and methodology for the international publicity translation of political texts.This paper applies the viewpoint and methodology of eco-translatology,focuses on the three-dimensional transformation of language,culture,and communication,and discusses how translators can adapt to the eco-environment of political texts through the specific example of the keynote speech of China’s president at the opening ceremony of the Third Belt and Road Forum for International Cooperation and select suitable translation strategies and techniques to achieve an ecological balance of the target text in multiple dimensions.展开更多
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu...Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.展开更多
Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,wa...Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.展开更多
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.
文摘This study aims to construct an automotive English corpus to comprehensively compare the differences between English automotive marketing texts and their Chinese translations.The objective is to reveal challenges and opportunities in cultural and contextual translation.The research holds significant importance for understanding the impact of cross-cultural communication in the automotive market and providing more effective translation strategies for multinational automotive manufacturers.Through corpus analysis,focusing on common marketing phrases and text features,employing both quantitative and qualitative analysis methods,and examining the accuracy,naturalness,and cultural adaptability of translated texts,we delve into the similarities and differences in conveying automotive information between the two languages.The study finds that expressive and emotional expressions commonly used in English automotive contexts may encounter challenges in Chinese translations due to language and cultural differences.This necessitates the adoption of more flexible translation strategies.Additionally,Chinese translations tend to emphasize the practicality and safety of products more than their English counterparts,placing a greater emphasis on technical and functional descriptions.The primary conclusion of this research is that the translation of automotive marketing texts requires heightened cross-cultural sensitivity and an understanding of the target audience.When translating automotive advertisements and promotions,translators should consider consumer expectations and cultural values in different contexts to ensure the effectiveness and adaptability of the translation.Furthermore,the formulation of more flexible translation strategies,integrating local culture and market demands,will contribute to enhancing the image and influence of automotive brands in the international market.Through this study,we provide deeper insights for automotive manufacturers,assisting them in leveraging the power of language for successful global market penetration.
基金This article is a phased achievement of the 2020 research project“Research on Chinese-Russian Translation of Political Terminology Based on Corpora”(YB2020005)by CNTERM.
文摘The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units.
基金This work was funded by the Deanship of Scientific Research at Jouf University under Grant Number(DSR2022-RG-0114).
文摘The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recognition system that allows the isolation,the extraction,and the recognition of text in the case of documents having a textured background,a degraded aspect of colors,and of poor quality,and to synthesize it into speech.This system basically consists of three algorithms:a text localization and detection algorithm based on mathematical morphology method(MMM);a text extraction algorithm based on the gamma correction method(GCM);and an optical character recognition(OCR)algorithm for text recognition.A detailed complexity study of the different blocks of this text recognition system has been realized.Following this study,an acceleration of the GCM algorithm(AGCM)is proposed.The AGCM algorithm has reduced the complexity in the text recognition system by 70%and kept the same quality of text recognition as that of the original method.To assist visually impaired persons,a graphical interface of the entire text recognition chain has been developed,allowing the capture of images from a camera,rapid and intuitive visualization of the recognized text from this image,and text-to-speech synthesis.Our text recognition system provides an improvement of 6.8%for the recognition rate and 7.6%for the F-measure relative to GCM and AGCM algorithms.
基金supported by the National Research Foundation of Korea(NRF)Grant Funded by the Korea Government(MSIT)(NRF-2020R1A2B5B0100208).
文摘This study aimed to explore citizens’emotional responses and issues of interest in the context of the coronavirus disease 2019(COVID-19)pandemic.The dataset comprised 65,313 tweets with the location marked as New York State.The data collection period was four days of tweets when New York City imposed a lockdown order due to an increase in confirmed cases.Data analysis was performed using R Studio.The emotional responses in tweets were analyzed using the Bing and NRC(National Research Council Canada)dictionaries.The tweets’central issue was identified by Text Network Analysis.When tweets were classified as either positive or negative,the negative sentiment was higher.Using the NRC dictionary,eight emotional classifications were devised:“trust,”“fear,”“anticipation,”“sadness,”“anger,”“joy,”“surprise,”and“disgust.”These results indicated that citizens showed negative and trusting emotional reactions in the early days of the pandemic.Moreover,citizens showed a strong interest in overcoming and coping with other people such as social solidarity.Citizens were concerned about the confirmation of COVID-19 infection status and death.Efforts should be made to ensure citizens’psychological stability by promptly informing them of the status of infectious disease management and the route of infection.
基金the Open Project of China Grand Canal Research Institute,Yangzhou University(DYH202211)Jiangsu Provincial Social Science Applied Research Excellent Project(22SYB-053).
文摘The historical and cultural districts of a city serve as important cultural heritage and tourism resources.This paper focused on four such districts in Yangzhou and performed semantic analysis on online public comments using ROST CM6 software.According to the high frequency words,attention preference of district site elements,activities and feelings in Yangzhou historical and cultural districts were analyzed.Through the analysis of semantic network and public emotional tendency,the relationship between the protection and utilization of Yangzhou historical and cultural districts and the perception and demand of users were discussed,and some suggestions for the protection,utilization and renewal of historical and cultural districts were put forward.
基金supported by the General Projects of ISTIC Innovation Foundation“Problem innovation solution mining based on text generation model”(MS2024-03).
文摘Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
基金Sponsored by the Scientific Research Project of Zhejiang Provincial Department of Education(Grant No.KYY-ZX-20210329).
文摘To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.
基金the National Natural Science Foundation of PRChina(42075130)Nari Technology Co.,Ltd.(4561655965)。
文摘Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.
文摘Eco-translatology provides a new perspective and methodology for the international publicity translation of political texts.This paper applies the viewpoint and methodology of eco-translatology,focuses on the three-dimensional transformation of language,culture,and communication,and discusses how translators can adapt to the eco-environment of political texts through the specific example of the keynote speech of China’s president at the opening ceremony of the Third Belt and Road Forum for International Cooperation and select suitable translation strategies and techniques to achieve an ecological balance of the target text in multiple dimensions.
文摘Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.
文摘Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.